Removing Ghosts, Creepy Crawlers and Evil Spiders from Your Data

All forms of spam referrals are actually very scary. Google has not come up with a solution for this problem yet (although they are reportedly working on it).

If you are not filtering out spam referrals from your Google Analytics data now, your traffic information could be getting skewed by upwards of 50% or more. If you are making business decisions from your data (and you should be), your data could be leading you to false conclusions.

Never click on spam referral links! That’s what they want you to do.

If you’re not sure if it’s a spam referrer, do a quick Google search to find out if it’s a trustworthy link first.

Google Analytics (GA) referral spam is now much worse than it was just a few months ago. I wrote an answer to this topic on Quora back in March, and had to update my answer due to additional nefarious activities by spammers. It’s gotten so bad, we just had a client with over 51% of their total traffic — and a whopping 87% of their referral traffic — reported last week coming from spam referrers that use ghosting and misbehaving bots and spiders!

You can no longer use the standard Google Analytics filters effectively

If you’re relying on traffic metrics, it’s absolutely essential to filter out this spam, and do it correctly. Simple filters do not work as ghost spammers do not actually hit your website. In addition, if you’re adding a secondary dimension or applying an Advanced Segment, GA will go back to that unfiltered data to rerun your report, then filter the results for your specific view. This is not what you want.

Note that some sites instruct you to use GA’s Referral Exclusion List. This does not work! There are several problems with this. Instead, I’ll show you how to set it up correctly.

Preparation

The first thing you need to do is create a new View within the Property you wish to exclude the referral spam from. It is best practice to leave an All Website Data View to collect all raw data. Go to your GA Admin dashboard and in the third column, create a new View. You can name it anything you wish. We like to create a ‘Marketing View — websitename.com’ which contains all our other filters. Some people like to create test views. This is also a good practice to test your filters against raw data as you will see below.

- - -

Here are the three steps to correctly filter your Google Analytics referrer spam traffic in your newly created view:

1. Create a Valid Hostnames Filter

We use to set up individual filters for each spam referrer. This is no longer efficient as new spammers are popping up on a daily basis now. In addition, ghost referrers don’t actually hit your website. They are using software to post fake visits based on your tracking ID (done randomly). So typical filters will not remove these ghosts.

Because ghost referrers don’t actually visit your site, it’s easy to identify them, because they use a fake hostname value. For this reason, we need a filter that lets IN the good traffic, and blocks everything else out. Remember, only good traffic will be visiting your servers hostname(s). It doesn’t matter who the referrer is.

Follow these steps very carefully:

  • First, you must identify all sites that may be using your tracking ID, which may include Ecommerce engines such as PayPal for example.

To identify them, run a multi-year report showing just hostnames (Audience > Technology > Network) and note the valid ones listed in the first column. It’s a long list, but you need to go through it and make sure you find those sites that actually use your tracking code (btw, Google.com is not one of them). You probably have a good idea of which ones you linked up with.

Note that this can get a little tricky if you have forgotten any, so go through the list carefully. For example, if you’re using YouTube, you need to leave that one if you’re tracking hits to your YouTube channel. At some point, you added your tracking code in your account settings to enable it to track on your Google Analytics account.

Also be aware that subdomains may need to be added as well if you’re using the same tracking code for your sub-domains (otherwise set up a separate filter for each different tracking code/properties that you have). In the steps below, I’ll show you a trick to automatically add subdomains for any host.

Also, if you have tracking from (not set) hostnames, you will need to look into these in more detail. If you are using event-based tracking code within GTM or hard-coded using analytics.js, you need to identify these and make sure you’re not blocking a site using event-based call logging.

  • Next, open your GA account to the administration screen and create a new filter (Account > Filters > + New Filter)
  • Name the filter Hostnames
  • Select the ‘Include’ radio button
  • In the field, include the hostnames must be entered as a regular expression. Don’t panic. It’s easy.

The easiest way is to enter each domain name starting with ‘.*’ which is an ‘anything’ wildcard that will recognize all subdomains for a domain (for example, blog.mysite.com, and mysite.com).

Next, separate each domain with the pipe symbol ‘|’ like that.

As an example, you might enter .*mysite.com|.*paypal.com

It’s important not to begin or end with the pipe symbol nor include any spaces.

  • Add the filter to the new View that you created (ex: Marketing View).
  • Save your filter, then test it to verify your new filter is working properly!

The best way to test this is to compare the referrer metrics in your All Website Data view with those in your new view to make certain you’re not blocking a site that you need to track. Watch it for the next week or two.

Keep in mind that the data will only appear after the filter has been enabled. If you need to go back and see historical data minus spam referrals, and you’re comfortable with advanced segmentation in GA, you can also filter ghost referrals using this method. If you are not sure about your hostnames links, or need help with advanced segmentation for historical data, consult a professional.

Special Note: It is critical that you update this filter every time you add a new service. These services will ask for your GA Tracking ID or prompt you with a request to link to your GA Account. Once you provide it, get that domain into your Hostnames Filter pronto.

2. Create an Exclude Hostnames Filter

Great! Now we have our good traffic, and we’ve expunged our ghosts. Now, it’s time to get rid of the creepy crawlies.

The second filter is much like the first, except we’ll be excluding instead of including our good hostnames. In this case, we’ll be focusing sites pretending to link to you as legitimate referral sources. Again, do not click on these links! Most people try to filter the domain, but this doesn’t work as we need to be able to filter on a full referral string.

The best way to do this is to identify these bogus referrers is to look for a referral campaign Source tag with a matching domain. (See my article on Campaign Tagging for more on this subject.)

  • Next, open your GA account to the administration screen and create a new filter (Account > Filters > + New Filter)
  • Name the filter Spam Referrals
  • Select the ‘Exclude’ radio button
  • Select Campaign Source from the Filter Field
  • In the field, include the hostnames that need to be filtered as a regular expression just as we did above.

Here is an example of several referrer spamers that we include in our filters:

.*free-share-buttons.com|.*free-social-buttons.com

  • Add the filter to the new View that you created (ex: Marketing View).
  • Save your filter, then test it to verify your new filter is working properly!

Again, compare the referral metrics from the All Website Data view (against results in your new view) to make certain you’re not blocking a site that you need to track. Watch it for the next week or two.

3. Create a Filter for Bots

OK, two down, one to go. We’re almost done, and this is the easy one.

Fortunately, Google keeps a list of Web bots (the good ones) which you can exclude from your reports. This is a start, but it doesn’t completely remove the problem.

Even though these are good bots, we suggest keeping them out of your reports because they do not represent the traffic (humans) you wish to monitor. This is done on the view level as follows:

  • Open the GA admin screen and scroll over to the third-column view.
  • Select the View you wish to exclude bots from (ex: Marketing View)
  • Select the View Settings, then click on the Bot filtering checkbox that says ‘Exclude all hits from known bots and spiders’
  • Save your selection

To go beyond this, you will have to exclude other bots by modifying your .htaccess file. This is beyond the scope of this article, however what we’ve covered in these three filters will greatly improve your data by eliminating a majority of the spam referrers we are now seeing.

Until Google comes up with a better solution, we will all need to stay on top of this problem.

Do you need all three filters? YES!

If you run a report to show you the effects of the filters, you will see different traffic being filtered by each of the three methods described above. It does indeed require all three to eliminate all the methods of spamming your data.

EDIT: We’ll set up your Google Analytics referral spam filters for FREE!

Due to the huge demand in questions on the subject, for a limited time, we’re offering to filter your referral spam for absolutely no cost. No caveats, strings or hidden surprises. Yes, that’s right. One of our certified Google Analytics experts will set up your filters for free! You can find all the details here. Thanks, and happy filtering.

Show your support

Clapping shows how much you appreciated Gravital Digital’s story.