I solved referrer spam & it only cost my soul (or a custom dimension)
For me referrer spam in Google Analytics isn’t just a small nuisance. Having a site with not that much traffic these abhorrent idiots polluting my tracking data are unnerving to say the least. Having to update my filters nearly every day only to have my data polluted non the less could nearly be a reason to switch tracking tools.
Why Google, even after months is not able to deliver a solution also seems a little odd to me.Some days ago I stumbled upon a solution nearly by accident:Working with multiple clients at different traffic scales for most of them the scale of this spam does not seem to be a problem at all. Non the less, it does polute their data. And I am a sucker for clean data.
For quite some time now I have been thinking about a solution for said problem. When within another context a client of the agency I work for started tracking the GA cookie ID into a custom dimension.
I thought about replicating this for my own implementation just to test something. That’s when I discovered that the hits coming from these spammers did not have a cookie ID. I had thought about it already. These polluting hits sent via the Measurement Protocol.
And no wonder as they (ab)use the measurement protocol’s inability to make sure only valid requests are registered into a GA property. Without any validation for tracking hits in place it is quite too easy for these spammers to massively send false data into other’s properties.
So if you do not need to use the Measurement Protocol, then there is a very simple solution. And instead of costing your soul, it only requires the use of an additional custom dimension. Especially if you are using Google Tag Manager the following steps can be implemented in minutes.
If you are in need of the Measurement Protocol, a very similar solution will work, but it would require a little bit more effort. Especially on the implementation side for tracking with the Measurement Protocol. More on that at the end of this post.
Solution for Referrer Spam using Google Tag Manager
#1: Create new CustomDimension
First of all you have to create a new custom dimension inside your Google Analytics property. Give it a Scope, that makes sense to you (I decided on using User as a scope).
Remember the index number, as you will later need this. And remember, that (at least in the free version of Google Analytics) you only have 20 dimensions. So you see, killing referrer spam comes at a price. Not quite your soul, but non the less using a custom dimension for something, that Google should have already figured out is not trivial.
#2: Create View for testing and implement filter
First of all. Create a View in your Google Analytics property for testing purposes. This is of utmost importance, as you do not wanna potentially taint your data in the raw data view by adding a mistake with your filter and destroying your data’s integrity.
So if you do not already have one, create a view for testing filters and implement the following filter:
- A custom filter
- Use Include-Ruleset
- Filter Field: The Name of the custom dimension you implemented in you last step (in my case: Cookie GA)
- Filter Patter: ^GA[0–9]\..*
Then save this filter and go on with the next step.
#3: Create GTM Custom Variable
Next you have to create a custom variable in your Google Tag Manager Container. Give it a name that follows your naming convention (I use cookie as a prefix for all types of cookie-variables).
Then add this variable to your basic Universal Analytics Tag an use the right index-number. If you have additional GA tags in your GTM container, I recommend adding the variable there as well. Especially if you decided on hit level scope.
Now you should test everything using the preview mode and the analytics debugging tools of your choice. If after some days all looks good, implement filter in normal filtered view (and remember always keep raw data view in place without filter).
As Stephen pointed out in the comments the GTM variable should be a ‘First Party Cookie’-variable.
So basically I’m reading the GA-Cookie, writing it into a custom dimension and only accepting hits in the filtered view, that have a real cookie (via the custom dimension).
#4: Solved referrer spam
Congratz you just solved the problem of referrer spam for yourself. At least, if you use Google Tag Manager. If not, the following solution might help you achieve the same result.
Then set this as a custom dimension within your GA tracking code (to be found at the official documentation) with
ga(‘set’, ‘dimension5’, ‘custom data’);
Do these two steps first thing, before sending the tracking information to the Google Analytics Server. If you do not wanna track the Google Analytics cookie for whatever reasons see the next chapter for a more data privacy friendly idea, what to use instead of the GA cookie.
Recommended video explanation:
Over at YouTube there is a great video from GTM Training, that shows the above solution as a video-tutorial with additional infos in it as well. So pay them some love and jump over there to have a look.
Solution for Measurement Protocol usage
You might think for yourself that this seems like a nice solution, but as you are in need of using the Measurement Protokoll for sending data into Google Analytics this would not help you in any way. Aren’t I right? Well, do not fear, I have a solution ready for you as well. In my humble opinion, this should be the solution, that Google would already have implemented by the way. Basically you also need a custom dimension to track some kind of app secret, or token that you use for fighting referrer spam.
Basically just follow steps 1 & 2 from the chapter above. But change the filter to something that resembles your personal secret. Ideally you might use a static value, that you can implement in the Measurement Protocol more easy.
You would have to use the cd<dimensionIndex> parameter like cd1=uft_23455b65.
Here you find the description in the official documentation. Then filter for that in your test view, analogous to the description above.
So as said, you basically create your own token, to track with every hit in the frontend, as well as in the Measurement Protokoll implementation. Then you use a custom dimension, to send it to the tracking server and use this to filter only valid tracking hits into your working view of Google Analytics.
If after some days all looks good, implement the filter in filtered working view (and remember: always keep raw data view in place without filter).
Why Google did not implement some kind of additional dimension working as a customizable token, I have no idea, but I am sure they are already working on a solution, as Stéphane Hamel pointed out in the Google Analytics community over on Google+. The guys over there are surely most dedicated to solve the problem, as this clearly taints the product experience right now and I strongly believe, the great people over at Google do not wanna tolerate this for any moment longer then necessary.
Keep looking for the official solution non the less. And in the meantime I hope I could be of a little bit help to you.