What are XML Sitemaps and how to create and submit them

Image courtesy of INPIVIC Family at Flickr.com

Has it ever happened to you that your pages take a while to get indexed in search engines? Most of the time, search engines have difficulties finding the pages of your site and this is usually due to the fact that they have contents with JavaScript, Java or Flash, your website is recent and you haven’t linked the internal pages as much as you should, or you have many pages and these are not correctly linked. In these cases, an XML sitemap is a valuable resource to improve the indexing process and offer a quick solution that search engines can understand.

Ultimately, the goal is to get search engines to find your content as soon as possible to start positioning, gaining authority, and get the game going. Let’s take a look at some of the aspects that you should keep in mind.

You may already know or you may have seen HTML sitemaps before. They are these static pages that you access usually through a link in the footer and they give you an idea of the general structure of the website. Does it sound familiar? Well, an XML sitemap is the same thing, but in a format that search engines can understand.

In 2005, Google realized that sitemaps in the traditional way were helpful to locate the pages of a website that were available for indexing. What happened is that these could be improved from the search engine’s perspective, so they decided to launch the Google Sitemap protocol. About a year later, Bing and Yahoo! joined this initiative.

In its most basic form, an XML sitemap is a file written in extensible markup language, which provides a list of the pages of the site along with other data. This information helps search engines identify the pages that can be crawled and their exact location.

This is why, when you have problems indexing certain pages, an XML sitemap makes it more likely for the search engine to find and visit them, even though there is no guarantee. Imagine that, on your site, you have pages X, Y, and Z. Google finds pages X and Y through the regular crawling of your links. Then you create a sitemap with pages Y and Z. Now, there is a possibility -but not a promise- that Google will explore page Z. They won’t exclude page X just because it’s not listed on your sitemap. And only because you included a page that Google didn’t know, that’s not enough guarantee of the fact that it will be listed.

Sitemaps are helpful not only to get search engines to find simple pages, but also videos, news, images and content for mobile devices. The sitemaps protocol admits:

  • Video sitemaps: You can increase the chances of your videos being discovered by search engines if you include them on a sitemap. Video sitemaps allow you to include the URL where the video is, as well as its title, description and location of the thumbnail image.
  • Image sitemaps: As it happens with video sitemaps, you can also improve the indexing of your images if you add them to an XML sitemap.
  • News sitemaps: They allow you to control the content that you send over to Google News.
  • Mobile sitemaps: It indicates the path where your website’s contents for mobile devices are located.
Image courtesy of James Box at Flickr.com

The simplest sitemap that you can create is a text file with one URL per line. The problem of this format is that you cannot include additional information, such as the date of the last modification, the publishing frequency and the priority. However, with an XML formatted sitemap, this is possible.

The required tags are the following:

  • <urlset> is the opening tag. The file ends with </urlset>.
  • <url> must be included for each one of the URLs that you are going to specify.
  • <loc> defines the URL of the page. It must specify the protocol (HTTP or HTTPS), and it has a maximum size limit of 2048 characters.

You can also include the following optional tags:

  • <lastmod> is the date of the last modification of the file. The format that should be used is YYYY-MM-DD.
  • <changefreq> is the approximate frequency with which the page is modified. The values that it can have are: always, hourly, daily, weekly, monthly, yearly, never.
  • <priority> is a tag used to define the priority of a URL in a relative way in regard to the rest of the URLs of the website. Search engines can use it to choose one or another URL in its results, in function of which is the one that will get the most priority. Its value can range from 0.0 to 1.0 and the default priority is 0.5.

At first sight it might look complicated, but don’t worry, because you don’t need to write all of this manually. There’s a handful of tools that can do this for you.

Let’s take a look at some of the sitemap generators that are common and easy to use:

XML-sitemaps. A simple online tool. You enter your domain and it automatically crawls all the pages, creating sitemaps of up to 500 pages. If your website is a large one, you can upgrade to a paid version that is installed on the server and it works with PHP. It’s unlimited and it allows the generation of video, image, news and mobile sitemaps.

Google XML sitemaps. This is a must-have plugin for WordPress. It updates the sitemap automatically every time you publish a new post or page, and it’s highly customizable.

You already know a couple of tools, but if they aren’t enough to cover your necessities, there are programs that you can run on your computer, others require installation on your server and others work from a website.

How to submit an XML sitemap to a search engine

Once you have the sitemap file, the following step is uploading it to the root directory of your website. You can call it however you want. Then, you need to make search engines know of its existence in order for them to go and examine it. There are three ways to do this:

First method: Webmaster tools

You can send the map through the search engines webmaster stool. these are the steps that you must follow.

1. Create a Webmaster tools account in Google or Bing.

2. Add the URL of your website.

3. Verify the website.

4. Upload your sitemap:

  • In Google, click “Crawl” and then “Sitemaps”
  • In Bing, click “Configure my site” and then “Sitemaps”

Second method: Robots

You can include a line at the end of the robots.txt file on your website.

Sitemap: http://www.yoursite.com/sitemap.xml

It’s as easy as that.

Third method: Ping

Pinging is like telling the search engine “my sitemap is over here”, and doing it is as simple as loading an address on your web browser. It’s the least secure method of all three, but it has been working so far.

With that being said, here are the addresses that you should load, where SITEMAP_URL is the URL of your sitemap.

Did you manage to generate your sitemap and submit it to the search engine? Hopefully this information will be useful for improving your website’s SEO.

Related content: Read Reputation Defender’s “19 Tips to Improve Your Search Engine Positioning Right Now”

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.