How did I break a captcha with Puppeteer and Google Vision ?

(https://unsplash.com/@rocknrollmonke)

Introduction

To prevent robots or users from misusing forms, the implementation of a CAPTCHA system is an effective and widespread solution.

However, I will show you that this one may be useless with tools like Puppeteer and Google Vision.

I recently changed telephone operator and I could see that they were using a homemade CAPTCHA. 🤖

The goal is simple, each time a page is reloaded, a list of words is given and 6 images to the right of this list. Click on the image corresponding to the word. 🖼️

Seeing the CAPTCHA resolution method I wondered if it was not possible with some techniques to analyze the images to deduce terms and thus solve the CAPTCHA.

First of all, it was necessary to find tools to analyze images efficiently.

  1. Reverse Google Images
  2. Clarifai
  3. Google Vision

I immediately abandoned the idea of using Google Images, the results were far too imprecise or even wrong.

(okay, a lion is a big cat 🐈, but still..)

I therefore turned to Clarifai which offers you to upload your images to see the results proposed.

The result is reliable but unfortunately not yet effective enough on some images.

You will therefore suspect that I turned to Google Vision to perform the image analysis. Before we talk in more detail about this tool, let’s talk about Puppeteer.

Puppeteer is a NodeJS library giving access to an API that will allow you to control an instance of the Chrome browser.

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});

await browser.close();
})();

The code above allows you to go to https://example.com and take a screenshot of this page. 📸

All this in a headless way, you will not see the Chrome instance open.


Process

  1. Go to the page containing a CAPTCHA.
  2. Retrieve the 6 words and upload the 6 photos locally.
  3. Analyze each image with Google Vision to retrieve terms describing them.
  4. Match words and images.
  5. According to this correspondence make clicks.

PoC

I developed the script corresponding to the procedures listed above, you can find a demonstration video and the source code on Github. 😉