Google Analytics’ spam referrer problem
If you’ve used Google Analytics on any new project, you’ll be all too familiar with the issue of spam referrers. You check to see who is sending you traffic, and are greeted with a site that looks like it has just pushed a healthy amount of traffic your way. You visit it to discover that it doesn’t actually look like it has linked to you, and you are the muppet they are trying to sell something to, or worse still infect you with some Malware.
Here’s just one that showed up for me on a new project.

I’ve had a bash recently at seeing how to prevent this sort of thing. There are a variety of ways in which spammers abuse Google Analytics tracking code.
1) Crawl various site, looking for Analytics ID’s adding your ID to their own database that they iterate through. Then use that to send a fake HTTP referrer request.
2) Ghost Referrers don’t even waste the time crawling and looking for the Analytics ID, they simple randomly generate a new Analytics ID, in the format:
UA-{7 DIGIT NUMBER}–{1 DIGIT NUMBER}
Then send youan analytics request, with the hope that they’ll strike it lucky every once in a while and leave a link back to their site.
The first scenario is pretty easy to combat, simply don’t put your Analytics ID front and center on your site, bury it in javascript file elsewhere so it can’t be scraped directly out of your HTML. Most spammers will be looking for the boilerplate code that Google provide you with during the installation process or using a regular expression to pick it out. Once you’ve moved it, they aren’t going to waste the time looking in every file you create to find it. Alternatively you could encrypt and decrypt it before passing it to Google. You can also block referrers at source in an .htaccess file, or with server side code that matches a blacklist.
This blacklist is a pretty good resource, regularly updated by the community:
referrer-spam-blacklist - Community-contributed list of referrer spammers. Comment +1 in any issue or Pull request and…github.com
Combined with a bit of pseudocode, you could kill that referrer with fire or redirect it back to themselves.
$handle = fopen("https://raw.githubusercontent.com/piwik/referrer-spam-blacklist/master/spammers.txt", "r"); //read line one by one
while (!feof($handle)) // Loop til end of file.
{
$spammer = fgets($file_handle);
if( $spammer == HTTP_REFFER)
die();}
fclose($handle);
The second, is a bit trickier to combat. With each unique ID provided by Google being relatively easy to figure out by the spammers (see the format above), they can iterate to their hearts content, without going anywhere near your site to do it.
The best solution, is to combine the two techniques, block known bad bots either via HTACCESS or your own home brewed solution, then trust nothing but the URLs you know have come from users browsing your site.
The guys at OptimizeSmart have nailed it:
- Add a random hash to your URLs
2) Create a filter which only includes pages with your random hash
3) Create a search and replace filter, which rips out your random hash to make your URLs easier to consume.
Referrer spam and bad bot traffic continues to grow with bot traffic last year exceeding that of human traffic on the web.
ANY analytics program is only ever as good as it’s data integrity, and out of the box, this is a problem that every Analytics installation suffers from, and it’s frustrating that we’re at the point where we have to jump through these hoops just to get it to do it’s job.
This problem is something Google should really solve, by changing the UA id to something a bit harder to guess, or at least trying to determine human visitors before recording a hit on their servers. That said, considering GA is free to use, I can’t see them placing any significant resources on the problem.