Why XML sitemaps are important after all
Many SEO professionals downplay the humble sitemap — here’s why they’d be wrong
Ask anyone who works in the SEO business about XML sitemaps and a large portion of them will sneer.
They’ll tell you it’s an outdated means to inform Google to crawl all of your website’s pages, and that Google does this automatically, without requiring a sitemap.xml file.
“If your site’s pages are properly linked, our web crawlers can usually discover most of your site.
“Using a sitemap doesn’t guarantee that all the items in your sitemap will be crawled and indexed, as Google processes rely on complex algorithms to schedule crawling.
“However, in most cases, your site will benefit from having a sitemap, and you’ll never be penalized for having one.”
However, the same web page also gives some compelling reasons to include one. Let’s examine them in detail.
As Google states on the same web page a few lines below, Google doesn’t crawl your entire website in one go, but rather it learns what pages you link to internally and crawls them at a later date. This means that if your website is new and nobody has linked to it from their websites (e.g. blogs, news articles, Wikipedia pages etc.) you might benefit from adding a sitemap to include Google to crawl them sooner rather than later.
Thus, including a sitemap isn’t a ranking signal per se, it will help the task of SEO for your website to get off to a faster start.
What Google doesn’t mention
Oddly, Google neglects to mention the true power of the sitemap: the ability to rank your own pages. Let’s look at the anatomy of a basic XML sitemap to see how that works.
<?xml version=”1.0" encoding=”UTF-8"?>
That might look like a mess, but that’s XML for you. As you can see, each page is wrapped with ᴜʀʟ tags, and within that block you can specify a priority using a percentage expressed as a decimal, between 0 to 1 (so 0.25 would equate to 25%, or a low priority web page). Using this functionality, you can tell Google to focus on particular, juicier pages while neglecting (though not entirely) other pages that might be sparse in terms of content.
Beyond the sitemap
You might have guessed it already, but you can also use robots.txt to achieve a similar effect, only this time the pages you disallow will not be crawled by Google at all. Use this functionality for URLs that are reserved for server-side tasks, like web APIs or administrator login pages if you use a content management system.
Also, if you’re using a content management system, like Wordpress, it’s very easy to automate the generation of your sitemaps, but if you’re working with a largely static website or a bespoke website, it’s still well worth investing the time to add a sitemap.