How to block SEMalt in nginx and get your own back (a bit)

SEMalt is well known as a major referral spammer who clog up the analytics of a lot of small and medium sized websites with fake traffic. Annoyingly Google Analytics seems to be pretty bad at filtering it out.

There’s a quite a few posts from miffed webmasters about SEMalt’s activities detailing how to block them, and even a couple of Wordpress plugins, but here’s a quick way of blocking them (and others) in nginx and getting a tiny bit of revenge.

Identify who you want to block

In Google Analytics, set up a new segment with the following parameters. Basically you want to look for visitors with a suspiciously low session duration, new visitors, or visitors with a 100% bounce rate. Sadly for botnets such as SEMalt blocking by IP address/range is ineffective as their bots are highly distributed:

Create a GA Segment with session duration = 0 and user type = new visitor

You’ll then get a list of websites that are possible candidates for blocking. This is not to say these are all referral spamming, so look at the individual referring URLs before deciding whether to block them:

Google Analytics referral traffic; suspected referral spam

The main culprits I’ve found that you can include in the pipe separated list in the if statement below are:

  • semalt.com
  • buttons-for-website.com
  • make-money-online.7makemoneyonline.com
  • darodar.com (under various different forum.topic1234568 sub-domains)
  • hulfingtonpost.com
  • ilovevitaly.co
  • priceg.com
  • blackhatworth.com
  • forum20.smailik.org
  • o-o-6-o-o.com

Send them away!

Since SEMalt specifically don’t obey robots.txt or respect your analytics, why not redirect their bots back to their own site and mess with their GA implementation (as also suggested here):

if ($http_referer ~* (semalt\.com|buttons-for-website\.com) ) {
rewrite ^.*$ http://semalt.com?utm_source=google&utm_medium=organic&utm_term=stop+spamming+us permanent;
}

Or perhaps you might use a URL shortening service such as shadyurl.com to redirect them back through a more random URL and confuse their analytics even more.

To be fair these may not all be originating from the maker of SEMalt, so is it fair to redirect the traffic back to them only? Well given their unapologetic attitude on social media, we shouldn’t feel too guilty. But you could always set up individual if statements to each referrer.

Test, test, test!

Now since you’re messing with an nginx config file it’s best to test that you are actually blocking who you want to block, and not creating any false matches — the last thing we want to do is send SEMalt any real traffic!

So when you’ve set up your redirects/errors, you can use my Server Status tool to test the server responses from different referral strings.

What else can you do?

A blanket 403 forbidden error is pretty easy to set up in nginx, as follows:

if ($http_referer ~* (semalt\.com|buttons-for-website\.com|7makemoneyonline\.com) ) {
return 403;
}

Although that 403 seems a little boring, so you could issue a 410, 418, 420 error or something a bit more creative, and maybe they’ll eventually stop coming.

Beyond that, I don’t know what software these people are using, but from experience the following might throw a bit of a spanner in the works for their bots:

  1. Serve the bots a massive file download and clog up their bandwidth
  2. Serve the bots a page that always times out
  3. Redirect the bots to a massively malformed HTML file to try and break their crawler

Any other ideas will be gladly added!

Show your support

Clapping shows how much you appreciated Rob Hammond’s story.