ReCAPTCHA is Still Vulnerable: Perhaps More Than Ever Before

Published in

Arkose Labs

4 min readJan 6, 2015

At the start of December, a rather large update to the traditional reCAPTCHA technology was announced, dubbed the “No CAPTCHA reCAPTCHA” experience. For many, it came as a pleasant surprise — no more squiggly letters and hard-to-read numbers and images? What has been a frustrating experience for millions of internet users the world over looked to be getting a big injection of convenience.

But when the mechanics behind the “new” technology were broken down via reverse engineering, many developers asserted that this newly developed convenience is merely the addition of a “whitelist”. To put it simply: user’s past behavior and previous CAPTCHA solves are recorded in their cookies, which are then detected by future reCAPTCHA challenges. Those that are seen as being genuine users get the “No CAPTCHA experience”, while those that aren’t get reverted back to the usual distorted text reCAPTCHA.

The existing mechanics (and thus, flaws) behind the reCAPTCHA system are still there but with the introduction of this cookie “whitelist”, perhaps reCAPTCHA could be made easier for users, without simultaneously making it easier for bots. However — this looks to have backfired because of two main issues.

Easier for humans, easier for bots

The manner by which reCAPTCHA uses their new whitelist system has actually made it more easily exploited for no gain, according to www.sakurity.com consultant, Egor Homakov. In a blog post from December 4th, he eloquently sums up his findings (namely the whitelist and the consequences) but we wanted to break his findings down further and relate them to readers who may not have the experience necessary to fully grasp the conclusions Egor is coming to.

His first main concern is how relying on cookies for extra convenience doesn’t add any extra security at all. If the sole goal was to simply make it easier for humans without amplifying the existing security, then technically, it was a success. Egor declares this is important because the “No CAPTCHA reCAPTCHA Experience” doesn’t make it harder for bots — just easier for humans.

This is a problem, Egor says, due to the way the whitelist is implemented, allowing exploitation because “the legacy flow is still available and old OCR bots can keep recognizing” the old CAPTCHA.

For those making alternate CAPTCHAs, this was an interesting point of difference raised by Egor. For example, the FunCaptcha uses an approach opposite to how reCAPTCHA now does it. Instead of making it easier after repeated completions, FunCaptcha becomes harder after repeated mistakes. This is for two reasons:

1) To make a CAPTCHA that is inherently fast and easy for humans even easier would compromise its security against bots for no real gain.

2) A major vulnerability for visual CAPTCHAs with a small number of discrete answers is a brute-force attack by a bot, which performs automated guessing over and over until it breaks through. By tracking the history of the IP and making the CAPTCHA’s string of challenges longer after each failed attempt, a brute-force attack quickly becomes impractical.

Furthermore, many developers are puzzled by these changes — as explained by Egor’s findings, by trying to make the reCAPTCHA process more convenient, the latest changes have arguably compromised its security.

Removing Challenge/Response has removed the challenge — for bots

Egor further goes on to explain that by introducing the cookie whitelist as a replacement to the traditional “challenge/response” method, the service has become even more vulnerable to malicious attack via a process called “clickjacking”. If a valid cookie whitelist has been accumulated (known as “g-recaptcha-response”), then the user gets the “free pass”. How is this abused?

To reword Egor’s assertion: the person wanting to spam a certain website needs to obtain a valid “g-recaptcha-response” that matches the required credentials of the targeted website via an unsuspecting user. This is done by creating a fake variant of the target website’s reCAPTCHA, having an unsuspected user complete this fake variant and then using the generated “g-recaptcha-response” to give bots access to the original target’s website through the now breakable reCAPTCHA. This is made possible due to the “g-recaptcha-response” token being made available before submission to the CAPTCHA.

So, what does all this mean?

The conclusion that can be drawn from Egor’s findings? While the convenience of reCAPTCHA has somewhat increased for some users, so has the vulnerability. He proposes that the implementation of the cookie whitelist has not only opened the service to exploitation in and of itself, it has also opened a gateway into the existing technology by replacing challenge/response with “g-captcha-response” token.

CAPTCHA innovation has started to occur around the globe so there certainly are more options now. For developers of secure alternative types of CAPTCHA, the goal is to provide a method that, at its core, is already so quickly solvable that it makes room for the challenge to become lengthier in response to brute-force attacks, while still staying reasonable for humans accidentally caught in the net. Forcing it to become trivially solvable after building a whitelist of “human” behavior would be both pointless and potentially damaging — resulting in the position that Egor believes reCAPTCHA now finds itself in.

ReCAPTCHA is Still Vulnerable: Perhaps More Than Ever Before

Easier for humans, easier for bots

Removing Challenge/Response has removed the challenge — for bots

So, what does all this mean?

Written by Matthew Ford