Hybrid Web Automation

Isaac de la Peña
Aug 30 · 7 min read

When the captcha pops, the algo stops

In our prior article Web Automation we covered our main options when it comes to programmatically interact with web-based content, depending on the level of complexity required for the task at hand: from raw transactions, structured content, sessions, browsers, to traffic control via proxies. If you are new to this field I recommend you start there as it provides a step-by-step introduction and plenty of examples in Python.

In fact, all you need is there… if it wasn’t by the fact that many content owners hate web automation (often because they want to implement discriminative pricing tactics and force you to subscribe to their expensive premium API for these functions) and try to prevent it by any means possible, the most popular of which are the captcha screens in which you are presented with a challenge that requires human cognitive capabilities in order to solve and proceed with your browsing.

Image for post
Image for post
Sample Instagram Captcha screen

A Better Mousetrap

Of course you can deploy countermeasures in your code, both traditional and AI-based, in order to avoid triggering such captchas: throttle the velocity of your actions, space out your requests unevenly, add sporadic mouse movements and clicks… but no matter how carefully the rodent tiptoes through the site, he is bound to eventually fall in one trap or another, because in this cat-and-mouse game the feline is always holding the upper hand (or paw): the precise triggers are unknown, they may change over time, and some sites even implement trigger-less captchas at unexpected times “just in case”, similar to the random TSA screenings that happen at airports.

And while the first generation of captchas (e.g. “type the digits that you see in this image”) could be defeated implementing context-specific AI algorithms, the version of captchas widely used nowadays have a level of sophistication (e.g. “find specific objects in these series of images, which change over time”) that makes any attempt at automation too time consuming and impractical.

Image for post
Image for post
Dropbox’s “turn the animal until it is standing” sophisticated Captcha

So it seems that we are condemned to this motto: when the captcha pops, the algo stops. Our process breaks, our tasks don’t complete and frustration ensues. But there’s an alternative: if the captchas require human cognition… then let’s bring a human into the equation!. That’s what we call Hybrid Web Automation: a system that executes, for the most part, independently but when faced with an unexpected situation (such as a captcha screen) requests the assistance of a human counterpart and waits patiently until all is clear to resume normal operations instead of crashing down on the spot.

Example: Automating Instagram

To make our explanation as practical as possible, we are going to implement hybrid automation in Python to the particular use case of downloading all the pictures of an Instagram profile of our choosing.

It is important to remember, though, that “web automation” goes beyond the mere collection of content, and also includes the possibility of interacting with the web pages filling in forms, providing data and activating services. That is, bi-directional autonomous interaction. But for our purposes web scraping is the simplest case to portray, an MVP of sorts.

Essentially what we need to do is to create a wrapper around our methods using the Proxy Design Pattern such that we don’t call them directly but always via the proxy. Thus instead of:

We will write our functionality as:

Which reads as: when we request do_something it calls the proxy, which calls the inner _do_something method which in turn executes the required functionality from the browser. Should the task fail at any point, the process rolls back to the proxy where it stops (and calls the human using, for instance, visible and audible signals) until the eventuality is handled.

Image for post
Image for post
Humans and Machines working together!

Sample Code

The complete working code has been published in the same Git repository used in the prior Web Automation article, using Selenium Chrome as our programmatic web driver:

https://github.com/isaacdlp/scraphacks

First we have to login into Instagram. Our code supports both input of credentials as well as retrieval of stored cookies to reutilize sessions (login frequency is often on of the trigger criteria for captchas, and hence we cover our backs by minimizing the need for re-authentication):

Then, our implementation of the do_something function (called _instagram) is rather simple and can be found below:

It basically follows this routine:

  • Navigates to the target profile.
  • Traverses all media items sequentially, from last to first (because Instagram, as many other social sites, is built with a Progressive Feed pattern in mind: the older content loads once you scroll down the page).
  • Grabs the unique url of each media item (thus decoupling the gathering of items from the actual download, again a strategy for captcha prevention).
  • Returns the list of media items as a “Media” property.

Please note that besides its simplicity, the example has been extended to be able to handle both images and videos off the shelf.

Image for post
Image for post
Running our Hybrid Automation (background) to scrap my own Instagram profile (foreground)

Reusable Hybrid Wrapper

What is most interesting in the example above is that both the login and the scrapping methods use the same proxy function! Namely, this one:

Reusability is the whole point: the code above handles error gracefully no matter what was the original task at hand, loops a sound a configurable number of times (to call the attention of the human without being annoying in case he is busy with other matters) and presents a command-line prompt with the options of trying the last url, switching to a new url, moving on to the next, or exiting in the case that the error could not be addressed.

Even in this last situation (forced exit) implementing hybrid automation also helps enormously as we are still presented with an interactive session at the current browser in order to understand and debug the issue, instead of a crashed session, a closed browser and an ugly exception printout.

Image for post
Image for post
Help me help you, human :)

Furthermore, we can encapsulate our proxy into a Python object to keep the code and its actions portable across different web automation projects. That is precisely what we have done in the scrapper folder of our demo:

https://github.com/isaacdlp/scraphacks/tree/master/scrapper

Now the requirements to complete our particular Instagram example are simplified a great deal: first call the object we just created, and then download the specific media items as we see fit. The code we showcase below can be found in the socialscrap.py file:

https://github.com/isaacdlp/scraphacks/blob/master/socialscrap.py

Beyond that, please check the __init__.py file for more details on the wrapper implementation. It includes other advanced functions as generalized cookie handling, page scrolling, and screenshot captures of websites. Feel free to make the code your own, adapt it to your specific needs and extend it to support other use cases.

You are most welcome! 😃

Algonaut

Musings on technology, philosophy and economics

Isaac de la Peña

Written by

Partner @Conexo_vc & formerly @Inveready. Founder at Algonaut, Agora EAF & Playrific. MIT technologist. Finance, algorithmic trading, AI, big data, mobile, web.

Algonaut

Algonaut

Musings on technology, philosophy and economics

Isaac de la Peña

Written by

Partner @Conexo_vc & formerly @Inveready. Founder at Algonaut, Agora EAF & Playrific. MIT technologist. Finance, algorithmic trading, AI, big data, mobile, web.

Algonaut

Algonaut

Musings on technology, philosophy and economics

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store