Connecting Puppeteer to Existing Chrome Window w/ reCAPTCHA

Jared Potter
4 min readJan 25, 2019

--

<update>

Puppeteer 1.2.0 won’t work. Use 1.4.0 instead.

npm install puppeteer@1.4.0

Although there is a ‘high’ severity audit notice on install, if run on a local machine it should be safe enough.

Also

Instructions for Windows have changed. I’ve updated story below.

</update>

By itself Puppeteer (https://github.com/GoogleChrome/puppeteer) is a great tool created by Google to assist with automated UI testing. However, I’ve found it to be more useful in creating powerful and dynamic web-crawling/scraping scripts.

It’s worth clarifying this article isn’t recommending a strategy as a way to write tests. Instead it focuses on web-crawling/scraping. If you’re writing UI tests and need to get around captcha there are standard methods such as a separate testing environment with captcha disabled.

Background

The other common UI automation suite is Selenium WebDriver (I’ve always preferred Chrome web-driver with Python but other browsers / languages are supported). One “downside” with Selenium Chrome web-driver is that it always opens a fresh, independent, and isolated instance of Chrome without access to any existing user state. Nor the ability to easily set cookies/headers. This fresh instance can be either a good or a bad thing depending on the circumstance.

An example of a “bad” circumstance is when you have a script that needs to run behind a login page with a reCAPTCHA. By design reCAPTCHA blocks automated systems. The challenge becomes: how do we get around reCAPTCHA?

Typical login page with reCAPTCHA

The answer is Puppeteer and its ability to connect to an existing Chrome Window which you’ve already manually logged into.

Here’s how to do it.

Prerequisites

  1. An existing Node.js javascript project with Puppeteer (base project: https://github.com/JaredPotter/puppeteer-base-project)
  2. Google Chrome

Getting Started

  1. Start Chrome with remote debugging enabled.

MAC

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --no-first-run --no-default-browser-check --user-data-dir=$(mktemp -d -t 'chrome-remote_data_dir')

For Mac, once run you’ll see a printout like this:

DevTools listening on ws://localhost:9222/devtools/browser/41a0b5f0–6747–446a-91b6–5ba30c87e951

Windows

  1. Right click on your Google Chrome shortcut icon => Properties
  2. In Target field, add to the very end --remote-debugging-port=9222

Should look something like

"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222

3. Click “Apply” and re-launch Chrome

For Windows, next you’ll open a browser to http://127.0.0.1:9222/json/version

Existing Chrome Window

2. Copy websocketDebuggerUrl value(ws URL)

Note: this url changes each time you launch this separate instance of Chrome in this way.

Once this instance of Chrome is open you can manually log into whatever system you need access to, including solving reCAPTCHAs. This will set the necessary user state — cookies, tokens, etc.

User State Set

Next, inside your index.js file, instead of launching a new instance of Chrome you’ll connect to this existing version.

const wsChromeEndpointurl = 'ws://localhost:9222/devtools/browser/41a0b5f0–6747–446a-91b6–5ba30c87e951';const browser = await puppeteer.connect({    browserWSEndpoint: wsChromeEndpointurl,});

Couple of final tips:

  • Avoid calling browser.close(); at the end of your script unless that’s what you specifically intend to do. Otherwise you’ll have to re-open Chrome with the commands above.
  • Be aware that Puppeteer runs in a Node.js environment and thus has access to read/write files to the file system (fs is my favorite). Handy for exporting scraped data.
  • When navigating to a new page in Puppeteer and it happens to be a SPA utilizing the following syntax helps ensure the page fully loads before starting to interact with its elements.
await page.goto(pageUrl, {    waitUntil: 'networkidle0'});

Github

https://github.com/JaredPotter/puppeteer-base-project

Reference:
https://developer.mozilla.org/en-US/docs/Tools/Remote_Debugging/Chrome_Desktop

🔥 Want to learn Firebase? Checkout my Udemy Course Learn Firebase Core with React

--

--

Jared Potter

Software Engineer, Hiker, Personal Finance nerd, and Caffeine lover