Bypassing reCAPTCHA v2

Charlie Maddex
2 min readSep 17, 2020

--

Web Scraping is a technique that’s more popular than ever that’s used to extract large amounts of data from websites which can then be used for analysis or database enrichment. As this becomes more common, measures to stop these bots have evolved. Google’s reCAPTCHA v2 is an example of this.

First, lets analyse an example reCAPTCHA demo that Google provides.

reCAPTCHA Code Snippet

The main element we’ll be focusing on is the <div> with the “g-recaptcha” class. This element contains an attribute named data-sitekey, and it’s used to generate a response token (denoted by “g-recaptcha-response” in the request) to verify that the captcha was solved when submitting a request. Here’s a sample request in which I solved the captcha on the demo and submitted the form:

Our target is to get a valid recaptcha response token automatically. This can be done using captcha services such as 2Captcha, which we’ll be using in this example. We will provide their API with the sitekey and the page url, and they’ll give us a valid token. We’ll need the page url because the domain that the token was generated on is used as a verification process on the token. 2Captcha will handle all of this work for us and it’s extremely inexpensive. They currently charge $1.99/1000 reCAPTCHA tokens. Their API is documented here.

A Simple Python Implementation

Calling the solve_captcha script with the page url and the data-sitekey value should return a valid token. Unfortunately as we’ll need to wait for a token this implementation will freeze the thread, if that’s a big deal implement some multi-threading. Below is an example to get a valid token for Google’s demo:

Submit the correct response to a captcha protected page and you’ll be able to get around it instantly.

--

--