The Rel Canonical tag: usage and pitfalls
Rel canonical is a tag in the header of your HMTL that tells search engines which piece of content is the original and which one is a duplicate. It has been created in order to avoid duplicate content issues and optimize your SEO.
Indeed, some pieces of content can appear in multiple places of your website and thus be seen as duplicate content. As we know, duplicate content is penalized and rel canonical provides search engines a way to know which version of the content is the original, gives credit to that primary one, links the copy to the right URL and thus displays the good version in the search engines results.
Canonicalization is essential to create a well optimized website and offers a better user experience; users do not have to choose which version of a page is the best or more likely to be the original.
When should you use rel canonical?
There are multiple situations where duplicate content is relevant and should not be penalized:
- Multiple URLs: e-commerce websites which offer filter options like prices, sizes, colors, categories have a lot of URLs with duplicate content
- HTTP, HTTPS, WWW: a search engine can see http://www.mywebsite.com, http://mywebsite.com and https://www.website.com as different websites and will index them as such
- Mobile URL: mobile URLs like m.mywebsite.com are seen as duplicate content
- Country URL: content remains the same even if you are using specific country URLs. However, if the language is different, you may want search engines to offer separate results
- Session ID URLs, breadcrumbs links, printer friendly versions, permalinks: they are automatically generated
Actually, these examples are not true duplicate contents, they are system generated URLs. It means there are different URLs serving the same content but rel canonical should be used to tell search engines which one is the original content, and which URL should be crawled, indexed and returned on SERPs.
What should you be careful of?
First of all, you need to choose which URL is the main one and then insert at the top of your preferred URL <head> section:
<link rel=”canonical” href=”http://www.yourdomain.com/your-main-url/" />
Many CMS have integrated that tag and offer solutions to set this up. If not, search engines offer selective use of 301 redirects (if your website displays the same content on http, https and www at the same time), specific URL parameters with Google Webmaster Tools and HTTP headers using PHP or .htaccess.
Thus, there is a few rules to respect if you want your rel canonical to work well:
- Verify that the rel canonical target exists otherwise you will get a 404 error
- Check that the rel canonical target does not have a noindex robots meta tag
- Insert the rel canonical link in either the <head> of the page or the HTTP header and not in the <body>
- Include no more than one rel canonical per page. When more than one is specified, all rel canonicals will be ignored
- A large part of the duplicate page’s content should also be on the canonical version
If you need more insights take a look at this video, where Matt Cutts from Google explains the Canonical link element in detail:
To check your canonicals, you can use OnCrawl. Actually, it will draw a clear view of your canonical performances: if they are matching or not or simply if they are set.
And then, when clicking on a specific segment, you can access full URL details.