Skip to content

How to Create a Sitemap in Gatsby

Gatsby/

A sitemap is a file where you specify links to your pages on your site. Search engines like Google, Bing and others, use this file to crawl through the provided links. Sitemaps are usually written in XML format.

Search engines can’t always discover links of your website on their own. Perhaps your website is very big, and it takes them time to find every link. Maybe it’s the opposite, your website is too small and there are no external links to your website. Therefore, a sitemap can help your website appear in search results.

How do know if you need a sitemap? If you need some arguments to decide, Google has written some guidelines you can use. But, since you are here, you are probably more interested into creating a sitemap.

Prerequisites

You need to have Gatsby CLI installed. It is also recommended to set up a project using gatsby-starter-blog starter. The code examples use the starter, so it will be easier to follow along.

Step 1 — Installing Sitemap Plugin

In Gatsby, you generate a sitemap using the official plugin for creating sitemaps.

Run the following command in your terminal with your project’s directory open.

npm install gatsby-plugin-sitemap

Add the plugin to your gatsby-config.js file in root folder of your project. Also, you must specify siteUrl in siteMetadata for the plugin to work.

// gatsby-config.js

module.exports = {
  siteMetadata: {
    siteUrl: `https://www.example.com`,
  },
  plugins: ['gatsby-plugin-sitemap']
}

This plugin only creates a sitemap for the production version of your site. You need to build your project to test if the plugin works.

Build your project first.

gatsby build

Now, run it in production mode.

gatsby serve

You should see a message in your terminal with a URL to your website.

Unless your website has thousands of links, the plugin should generate 2 sitemap files.

Open /sitemap/sitemap-index.xml on your site that is being served. You should see a sitemap index file.

<sitemapindex>
  <sitemap>
    <loc>https://www.example.com/sitemap/sitemap-0.xml</loc>
  </sitemap>
</sitemapindex>

A sitemap index file contains links to other sitemaps that contain the actual links to your site’s pages. Sitemap index file splits your websites into many smaller sitemaps. Don’t worry about this, the plugin takes care of it.

To see the actual sitemap, open /sitemap/sitemap-0.xml.

<urlset>
  <url>
    <loc>https://www.example.com/hello-world/</loc>
      <changefreq>daily</changefreq>
      <priority>0.7</priority>
    </url>
  . . .
</urlset>

According to the Sitemap protocol, you should place your sitemap at the root directory of your site.

plugins: [
  {
    resolve: 'gatsby-plugin-sitemap',
    options: {
      output: '/'
    }
  }
]

Rebuild your site and serve it again. You should now see your sitemap by opening /sitemap-0.xml.

Step 2 — Adding Advanced Configuration

As you might have noticed, the plugin sets changefreq to daily and priority to 0.7 by default. If you want to change these values, you need to do more configuration. You can also show the time when the page was last time modified by adjusting the config and adding the lastmod property.

First thing you need to do is define a GraphQL query inside gatsby-config.js. Add it to your plugins array inside the gatsby-plugin-sitemap options. The query must fetch your site’s URL, which you previously specified in siteUrl. Furthermore, the query should fetch all the data about pages you want to use in your sitemap. Since you want the sitemap to show links, grab the path of every page in your site.

You can also get the date property from MarkdownRemark nodes, which you can use to set the lastmod property in your sitemap.

// gatsby-config.js

{
  resolve: 'gatsby-plugin-sitemap',
  options: {
    output: '/',
    query: `
    {
      site {
        siteMetadata {
          siteUrl
        }
      }
      allSitePage {
        nodes {
          path
        }
      }
      allMarkdownRemark {
        nodes {
          frontmatter {
            date
          },
          fields {
            slug
          }
        }
      }
    }`
  }
}

The query also fetches slug from MarkdownRemark nodes, which blog posts rely on. When you create pages from blog posts, you use their slug to build the page path. Which means that slug has a direct connection from blog post to its corresponding SitePage node. It will make sense in a bit, keep following along.

The next part is preparing the objects that will go into the sitemap. You add resolvePages function right under the query which takes an object as its argument. To shorten your code, you can extract the properties coming from the query. Do so by using JavasScript destructuring. Extract allSitePage nodes into allPages and allMarkdownRemark nodes into allPosts variables.

// gatsby-config.js

options: {
  output: '/',
  query: { /* . . . */ },
  resolvePages: ({
    allSitePage: { nodes: allPages },
    allMarkdownRemark: { nodes: allPosts },
  }) => {
    const pathToDateMap = {};

    allPosts.map(post => {
      pathToDateMap [post.fields.slug] = { date: post.frontmatter.date };
    });

    const pages = allPages.map(page => {
      return { ...page, ...pathToDateMap [page.path] };
    });

    return pages;
  }
}

Build a pathToDateMap object that maps the slug of the post, for example /post-slug/, to its publication date.

To connect the blog post publication dates to their pages use map array method. It creates a new array holding objects of each page, its path and its corresponding date, if exists in pathToDateMap. Complete the resolvePages by returning an array with the pages you want to put into the sitemap.

// gatsby-config.js

// . . .
options: {
  // . . .
  serialize: ({ path, date }) => {
    let entry = {
      url: path,
      changefreq: 'daily',
      priority: 0.5,
    };

    if (date) {
      entry.priority = 0.7;
      entry.lastmod = date;
    }

    return entry;
  }
}

You have complete control over the values you want to see in each entry of your sitemap. The object you return from serialize function represents each entry in the final sitemap.

Whew, that was a lot of work. Congratulations!

Checking Your Site on Search Engines

To see if your site is being crawled by search engines, you can use a special query. Open Google, Bing or DuckDuckGo and type site:your-website.com. Of course, substitute your-website.com with your own domain. Results of your query should show links of your website which the search engine has found.

Don’t see any results? Don’t worry. You can manually submit your sitemap to help search engines find your site sooner. Here are guides for Google and Bing, which should be a good start. The rest is up to you and I hope you find success.