Website Sitemaps: A Beginner’s Guide

What is a Sitemap?

A website sitemap is a navigational tool that provides a structured list of all the web pages and content within a website as well as provides information about specific types of content on your pages, including video, image, and news content. It serves as a map or directory, making it easier for both search engines and visitors to understand the layout and organization of the site.

For example, below is the page XML sitemap for Hike, which lists all of the pages. The blog posts are in a separate XML sitemap, making it more organized than having everything in a single one.

Why Are Website Sitemaps Important?

There are many reasons why sitemaps are important for improving SEO on a website, including for user experience.

Organize Large Websites

Website sitemaps play a pivotal role in organizing large websites (>500 pages), which often contain numerous pages and complex structures. They provide a clear and systematic overview of the site's content, making it easier for search engine crawlers to discover and navigate through the vast number of pages, ensuring that they are indexed in the SERPs.

Serves as a Project Management Tool

Sitemaps provide a high-level view of your site content & structure and thus can serve well as a valuable project management tool during many phases of the website lifecycle - everything from the website development stage to the ongoing content strategy & creation process. They help web developers, designers, and content creators plan the structure and hierarchy of a website, ensuring that every page is accounted for and integrated seamlessly. By using sitemaps as a blueprint, teams can collaborate more effectively and avoid issues that might negatively impact SEO, such as broken links or missing pages.

Enhances Search Engine Crawling & Visibility

Website sitemaps enhance search engine crawling and visibility, which is a critical part to SEO. Search engine bots use XML sitemaps and internal links to efficiently discover and index web pages. This improved crawlability ensures that a website's content is promptly recognized and ranked in search results, ultimately increasing its visibility and potential to attract organic traffic.

Allows for Spotting Indexing Issues

Sitemaps also serve as a diagnostic tool to identify potential indexing issues. By regularly checking XML sitemaps, webmasters can detect and rectify problems like missing pages, duplicate content, or errors in the site structure. Fixing these issues promptly can prevent a negative impact on a website's search engine rankings and user experience.

Provides an Alternative Means for Navigation

In addition to traditional website navigation menus, HTML sitemaps offer an alternative means for users to explore a website's content. This is particularly helpful for visitors who may not be familiar with the site's layout or those seeking specific information. By offering an additional method of navigation, HTML website sitemaps improve the overall user experience and reduce bounce rates, which can indirectly benefit SEO by keeping users engaged and on the site longer.

Do I Need a Sitemap?

Generally, it’s best practice for websites to have sitemaps, however, Google recommends to have a sitemap if the website is large (>500 pages), is brand new and only has a few external links to it, or if the site has a lot of rich media content (video, images) or is shown in Google News.

However, even if the website has fewer than 500 pages, is well linked internally, or it doesn’t have many media files, it’s still recommended to have a sitemap to ensure maximum crawlability & visibility within search engines.

Types of Website Sitemaps

There are two main types of sitemaps, each with their own benefits: XML sitemaps, and HTML sitemaps. Let’s explore each in more detail.

XML Sitemaps

XML stands for Extensible Markup Language and is a file format that provides structured information that is easy for both humans and machines to understand.

Below is the XML sitemap for Apple.com, you'll see it's formatted in XML file format, containing <url> and <loc> tags to highlight pages on the website that web crawlers should discover.

XML sitemaps contain the following information:

  • A list of URLs that should be indexed

  • A “lastmod” attribute that highlights when URLs were last updated
  • An "hreflang" attribute that informs of alternative language versions

There can be several types of XML sitemaps:

Normal Web Pages

This type of XML sitemap contains a list of the standard web pages on a website, including information about their last modification dates, change frequency, and page priority, helping search engines efficiently crawl and index these pages for search results.

Video Sitemaps

Video sitemaps contain information about video content on a website, such as video titles, descriptions, durations, and thumbnail URLs. They help search engines understand and index video content, making it more discoverable in search results. An important note is that  Video Schema has largely replaced the need for video sitemaps, so they are less often used.

Below you'll see instances of videos that are within Apple.com's Newsroom XML sitemap. The videos are nested within a specific page, have a thumbnail, title, description, link to content, duration, publication date, and even specifies if it's family friendly.

News Sitemaps

News sitemaps provide data about news articles and publications on a website, including publication dates, titles, and article keywords. They are used to inform search engines about the latest news content, ensuring timely and accurate inclusion in news search results.

Image Sitemaps

Image sitemaps contain details about images on a website, such as image URLs, captions, and licensing information. These sitemaps help search engines index and display images in image search results, improving a site's visibility for image-based queries.

Sitemap Index File

A Sitemap Index File is a special XML file that contains a list of multiple individual XML sitemaps. It's used when a website has a large number of pages or multiple content types, allowing search engines to efficiently manage and access various sitemaps for comprehensive indexing, making it easier for them to navigate and index the entire website's content effectively.

HTML Sitemaps

An HTML sitemap is a standard website page with links to various pages organized in a hierarchical way. You’ll often see links to HTML sitemaps in the footer of a website if they have one. The purpose of an HTML sitemap is to provide a supplementary resource to help users explore the website, however, it should not replace a good site navigation. Although it’s considered optional, it’s helpful to users if the website is quite large and has a complex structure.

Currys.co.uk has an HTML sitemap that lists all of their main pages. Although scrolling through will take a while, it allows users to find the area of the site they want to visit if they can't find it via their navigation:

How To Find a Sitemap

Locating a website sitemap is quite straightforward if you know where to look. There are several quick ways to find it, if they exist on the site.

Manually

The quickest way to check is within your website browser, type in your website domain and add /sitemap.xml or /sitemap_index.xml onto it. For example, www.example.com/sitemap.xml. If either of those files exist, they will load, otherwise they will lead to a 404 page. If it’s not found, it doesn’t mean that a sitemap doesn’t exist, because the file name could be different. That’s where the next few methods will come in handy.

Search Operators

In Google, Bing, or Yahoo, certain search operators can help filter results to highlight whether certain pages or files, like the sitemap, exists. Below are a few search operators that can be edited and copy/pasted into the search bar to reveal if a sitemap exists or not:

  • site:domain.com filetype:xml

  • site:domain.com inurl:sitemap
  • site:domain.com intitle:sitemap

Google Search Console

Another alternative method, if the website is already set up with Google Search Console, is to log into that account and check if a sitemap has already been submitted. If it has, then its location can be easily seen.

Robots.txt

Finally, one can also check the robots.txt file, located at the root of a website (e.g. www.example.com/robots.txt) to see if the sitemap is listed there. It’s best practice to, however, many sites haven’t added their XML sitemap or sitemap index file here yet.

For example, in Tesco.com's robots.txt file, their XML sitemap index file is listed so web crawlers and search engines can easily find it:

How To Create a Sitemap

If your website is using a managed CMS such as Wix, Squarespace, WordPress.com or other, then sitemaps should are already be included by default by the platform.

If you use a self-hosted WordPress installation, then using a plugin such as YOAST, Rankmath, or Google XML Sitemaps will automatically generate an XML sitemap for your website.

If your website is custom built or uses another CMS that doesn’t have a plugin that provides XML sitemap functionality, then using a website such as XML-Sitemaps.com can generate one for you.

Checking Sitemaps for Errors

If sitemaps are automatically generated by the CMS or plugin, then there’s no need to worry about errors as that system should handle them.

However, if sitemaps are manually generated, then some common issues that could occur are as follows:

  • Incorrect pages found in a sitemap

  • Sitemap has formatting errors
  • Sitemap files are too large (more than 50 MB or more than 50,000 URLs)

It’s important to make sure to review sitemap best practices (later in this article) are followed to prevent any sitemap errors from occurring.

How To Submit a Sitemap to Google

Just because a sitemap has been created, doesn’t necessarily mean search engines like Google know where to find it. For 100% certainty, it’s best to submit the website sitemap to Google so it knows where to look.

To do this, log into your Google Search Console account and go to Sitemaps section to submit the URL of your XML sitemap or sitemap index file.

Informing search engines where to look for the sitemap can speed up the process of Google discovering your sitemap, which will improve crawling and indexing performance.

Use Sitemaps to Spot Indexing Issues

Sitemaps can be very useful when troubleshooting indexing issues. By having access to an XML sitemap, you can quickly determine the total page count of the website, which tells you how many pages you want indexed, and then compare it to how many pages are actually indexed, using Google Search Console.

If there is a discrepancy between them, this means there is a potential issue that’s causing certain pages to not show up in search results, and needs to be identified & fixed.

Types of Pages to Exclude on Sitemaps

A common question is whether all pages and files on a website should be included on a sitemap or not. Below is a list of pages that should not be included on a sitemap:

  • Paginated Pages
  • Archived Pages
  • Redirected or 404 Pages
  • Comment Pages
  • No-Indexed Pages
  • Search Results Pages

Hike SEO

Hike SEO automatically crawls a website for pages, that are then available to add into the Keyword Sitemap tool to be matched up with target keywords. This allows you to easily keep track of which pages are being optimized around which keywords. Hike includes many other beginner friendly tools for small businesses to improve their SEO, one action at a time.

Try Hike today, and start improving your SEO performance in easy-to-follow action steps.