Which countries have the most bot traffic? (2019)
hCaptcha.com served many billions of requests in 2019, and some interesting trends emerged from all that data.
Where the data comes from
While we are very much focused on user privacy (see our Privacy Pass initiatives among other recent work) we are much less interested in letting bad actors stay in the shadows.
Our automated reporting mechanisms flag suspicious activity, and this feeds back into mitigation systems to slow down or stop abuse.
A wide variety of pre-emptive strategies then allow us to segment the behavior of our service for different populations. For example, a likely bot or clickfarm may get different questions, more of them, or tougher evaluations.
At any given time, around 0.1%–0.3% of the total internet is nominated for special treatment. This is the data source we have analyzed below.
Why this matters
In 2019 our competitors reached the limits of their less dynamic approaches: reCAPTCHA v2 and v3 can now be solved at high accuracy with neural networks and no longer offer much protection to publishers, though their users continue to produce valuable annotation and ad targeting data for Google. This is true of common text captchas as well.
Where do all these bad actors operate?
Below is a log-normed map of global hotspots where this kind of traffic originates, validated against over 1 billion requests sampled in H2 2019.
Looking at only the top 10 bad actor locales, we see the following:
As the graph shows, China, the US, Vietnam, Brazil, Russia, and Indonesia host the most bad actors in absolute terms.
However, this is not so surprising given the populations of these countries. A more interesting graph is perhaps the same data, normalized by population:
Here we see that Ukraine, Thailand, Vietnam, Turkey, and Russia are generating far more bad actor traffic proportional to their population than large countries like the US, China, and India.
The internet remains one of the last shared spaces for anonymous free expression, and while this stays true malicious behavior will always be a part of our online experience.
However, any response must be granular. As figure 3 shows, some countries exercise relatively less control over bad actors within their borders, but make up a relatively small percentage of this overall traffic.
Similarly, old approaches to blocking bad actors are increasingly ineffective.
By understanding the properties and limitations of neural networks we can continue to create captchas that are difficult for bots and easy for people, but the most widely used solutions at the moment completely fail this challenge.
We hope you enjoyed this security review, and thanks to all of our users for your trust in us. We look forward to continuing to defend the borders of the internet on your behalf!