How To Solve CAPTCHA While Web Scraping?

Octoparse
DataSeries
Published in
7 min readAug 9, 2021

--

CAPTCHAs are one of the most popular anti-scraping techniques implemented by website owners. reCaptcha v3 is a CAPTCHA integration solution from Google to detect bot traffic on websites. NuCaptcha, hCaptcha are some other advanced CAPTCHA solutions. But CAPTCHAs are quite irritating, not just for users but also for web scrapers. Solving CAPTCHAs is one of the top challenges faced by web scrapers. Read this insight to find different ways of solving CAPTCHAs while you scrape your target website’s content.

What is a CAPTCHA? And What’s a reCaptcha?

Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is an automated algorithm-generated textual, visual, or audio-based test. Solving CAPTCHAs requires three skills that humans are much better at than computers:

  • Invariant recognition (identifying different shapes, images of the same alphabet, object),
  • Segmentation (identifying overlapping alphabets), and
  • Parsing context (holistically understanding the image, text, or audio)

reCaptcha is the most popular CAPTCHA generating solution. It’s from Google and can be easily integrated into a website.

What are some popular types of CAPTCHAs?

1. Normal Captcha

--

--

Octoparse
DataSeries

Web scraping at a large scale without coding. Start simple, for free. www.octoparse.com