How to bypass reCaptcha V2 with Selenium?

Saman Esmaeil
Analytics Vidhya
Published in
5 min readFeb 29, 2020

(Update: 23 May. 2020) Up to version change of tools used in this tutorial, some methods were not working, so I’ve tried to update this solution with a new working code.

It’s on Github, and you can read the code and find out how it works.

In this tutorial, I will describe how to bypass re-captcha v2 (with challenge and interaction scoring), tested with Sync.me website to gather phone numbers.

We will use Python (unittest), Selenium, and Buster Captcha Solver.

First of all, it’s excellent to know Google reCaptcha V3 is scoring each request based on interactions with the website. The score for every request is between 0 to 1 and, 0 is for bots and 1 for very likely interaction.

On this tutorial, we will set up a Selenium bot to do a little gig with V2 challenges and scoring, then gathering people’s names from their phone number from Sync.me.

Python unittest is a tool for running tests; In fact, Selenium is a standard tool for writing tests for applications, Here we have a simple skeleton for running a test, there is tearDown for auto close the bot after it finished, here is our default skeleton:

The main result of using unittest is because of it’s assertion methods, you can easily follow up your bot step by step.

Ok, Now we are going to use Selenium, First import what we need:

My preferred web driver is Gecko, you need to have Gecko installed on your system. As you know for every web driver you should setup a it’s profile and capabilities, so we need to create a setup method, something like this:

The setUp() and tearDown() methods allow you to define instructions that will be executed before and after each test method with unittest.

Now let’s take a look to other methods, setUpOptions:

Headless browsing means you don’t want to run the browser with it’s graphical user interface.

As I said this is easy captcha solver, you don’t need to know about captcha solving algorithms, so a third party helps us, Buster!

Buster is a captcha solver using audio recognition, you can use it as your web browser extension, take a look at:

Now, lets setup a profile for gecko which uses buster, it means your browser will load the extension on it’s initialization.

Then, on our next step, we need to enable Marionette.

But what is Marionette? What Mozilla says about it?

Marionette is an automation driver for Mozilla’s Gecko engine. It can remotely control either the UI or the internal JavaScript of a Gecko platform, such as Firefox. It can control both the chrome (i.e. menus and functions) or the content (the webpage loaded inside the browsing context), giving a high level of control and ability to replicate user actions. In addition to performing actions on the browser, Marionette can also read the properties and attributes of the DOM.

Marionette is the new driver that is shipped/included with Firefox. This driver has it’s own protocol which is not directly compatible with the Selenium/WebDriver protocol.

The Gecko driver (previously named wires) is an application server implementing the Selenium/WebDriver protocol. It translates the Selenium commands and forwards them to the Marionette driver.

https://stackoverflow.com/questions/43272919/difference-between-webdriver-firefox-marionette-webdriver-gecko-driver

for more checkout it’s documentation, https://firefox-source-docs.mozilla.org/testing/marionette/Intro.html

If you want to setup proxy, you need to enable it’s capability for the driver, so let’s do it:

We are ready, let’s go to the main function:

The test_run method is the main function after calling unittest.main() on our code.driver.get will trying to load the https://sync.me.

Now, let’s find the input for phone:

We can use XML or CSS selector for locate the phone number input, Below is the of Selenium locator methods:

find_element_by_id

find_element_by_name

find_element_by_xpath

find_element_by_link_text

find_element_by_partial_link_text

find_element_by_tag_name

find_element_by_class_name

find_element_by_css_selector

find_elements_by_name

find_elements_by_xpath

find_elements_by_link_text

find_elements_by_partial_link_text

find_elements_by_tag_name

find_elements_by_class_name

find_elements_by_css_selector

I’m using Xpath for inspect the input and send number to it:

Wait for random seconds provide some valid interaction, and tell the app to wait for submit locator with 20 seconds timeouts, and click on it:

After submit we will face with reCaptcha on the result page:

Before click on checkbox, let’s simulate human-like mouse interaction for more valid interaction with page, for this issue we produce vectors using B-spline function and perform mouse movement with that.

B-spline is a function that has minimal support with respect to a given degree, smoothness, and domain partition.

So, because captcha is open on a new frame, we need to tell the driver and switch, then find recaptcha-anchor can provide us a location for start of our human-like mouse movement:

Next is to click on check box, wait and do the human-like mouse move again.

On this step, you will see the challenge which needs to be solve, because we are using Buster you can see the button for audio captcha solver:

Yes, that’s it, If we locate the the anchor we can find its ID.

But for get more score, first we need to click on audio button:

And catch up new frame:

On new frame, just click on solver-button like this:

and get back to main frame:

Yes, your bot bypassed the captcha.

Find out for more info on :

--

--