Fraud Detection and the Ad Community
We ran into an interesting situation last week, and in the interests of collaboration, I will share our detection of a botnet, and the methods we used to find it.
Recently Authenticated Digital data science team was auditing ad exchange data across all of our accounts. We found something peculiar — one of our internal metrics, called “completion rate” (which shows that we have completed an audit on an ad impression) was vastly lower in Japan (<5%) than normal (>90%). While a variance in completion rate across geographies is normal, an 85% drop in completion rate for a single country is not.
We looked into the IP address ranges of the traffic. Three IP address ranges were delivering traffic where there was a highly irregular 0% completion rate:
An IP database lookup identified Trend Micro as the owner of these IP address ranges. From what I can tell, Trend Micro is a security company and cloud service provider in Japan. Cloud service providers are frequently the places where bots are launched.
We analyzed this traffic across the hundreds of fields that we collect for each ad impression. Here are some notable findings:
- The user agent was the exact same for all traffic (IE 8 on Windows XP)
- A usual HTTP header (Accept-Language) was not included in these requests, whereas requests on other IP ranges from the same user agent string mostly included this header.
- The traffic was coming in 24 hours per day with no normal nighttime inactivity.
- There was no user engagement on any of the impressions
The footprint of these impressions seems to point to bot traffic. However, we decided to give the benefit of the doubt to legitimacy. We brainstormed what could be delivering this traffic. In general, here is what we came up with:
- Corporate Firewall. This traffic could be coming from a firewall in front of a large corporation. This could explain the same user agent, because corporate enterprises could have rolled out the exact copy of IE 8 on Windows XP for every single connected PC that is browsing the internet. Feasable, barely. However, ad traffic is coming steadily 24 hours per day — not a likely scenario.
- Bad Proxy. Perhaps some sort of proxy was installed in front of users, and for some reason changed the user agent string when connecting to websites. Again — feasible, but barely. However, we collect some data that is only available on certain browsers like Chrome or Firefox, and none of this data was collected. All traffic had the same footprint.
- Simple Botnet. A botnet where the operator did nothing sneaky to hide the botnet. All traffic had the same three IP address ranges, with the exact same user agent string. This is the only likely scenario we could think of.
Traffic started on this IP address range on January 23, and has been running continuously since then — over three months. The traffic volume coming from the IP address range is significant —0.6% of our user’s traffic, and it did not seem to concentrate on any particular advertiser, ad network, SSP, or exchange. If this traffic is not concentrated on our network specifically, then this single botnet consumed more than $15M in exchange inventory.
I would love to hear feedback from the quality teams from other ad companies:
- How was this simple bot able to pass through all of the anti-fraud filters of each publisher, SSP, exchange, and ad network for 3 months?
- What aspects of this bot made it look like ‘real’ traffic?
- How is this not on the IAB Spiders and Bots List?
Why isn’t there more collaboration when it comes to ad fraud? We owe it to the advertisers that ultimately pay for our services to do better.
Authenticated Digital is creating a transparent open ad exchange environment that builds trust between buyers and sellers.