Quickly identify thin & duplicate content for Google Panda
In the age of Google’s Panda algorithm, fixing issues with thin and duplicate content is more important than ever. However this can be a daunting task when faced with a big site that has millions of pages and potentially thousands of duplicate pages, or pages low on unique content — just sifting through and identifying this content can be a big task before even addressing it.
In order to address some of these challenges you can now use my free duplicate & thin content checker tool.
The tool is pretty simple to use:
1. Gather a list of URLs to check
You can enter a plain text list, or specify an XML Sitemap file to check the URLs for. If you’re getting URLs from a site: query from Google, URLProfiler has a free SERP scraping tool to help you.
You can also pull a list from Google’s Webmaster Tools HTML suggestions area, to quickly check the content Google has flagged as duplicate:
Generally this export is a bit useless as it gives you a ‘ | ‘ separated list of URLs. However you can just copy & paste this column into the tool & it will split these URLs out for you:
2. Tell the tool where content lives
If you’re checking articles, you can specify a CSS3 Selector to point the tool to your article content. Otherwise it will look at content within the <body> tag.
3. Run the report
When the report runs, as with most of my SEO tools, the browser feedback is real-time. When you stop the report, or it finishes, you’ll see a screen similar to the following:
You can then download the CSV into Excel, where you can use the Conditional Formatting > Highlight Cells Rules > Duplicate Values on the Text Hash column to identify duplicate content.
Thanks to the word count you can also identify thin content, and using Excel’s Conditional Formatting > Color Scales rule you can quickly spot any outliers: