How to Bypass CAPTCHA With Python Requests

4 min readNov 4, 2024

--

In this guide, I’ll go over how you can work with Python Requests to get past CAPTCHAs, while also touching on some important ethical considerations and common challenges.

Understanding CAPTCHA and Its Types

Before exploring ways to bypass CAPTCHA, let’s understand the different types:

  1. Image CAPTCHAs: Users select images based on a prompt.
  2. Text-Based CAPTCHAs: Require users to enter distorted text.
  3. reCAPTCHA v2 and v3: Google’s version asks users to verify their human identity, sometimes with image selections.
  4. hCAPTCHA: Similar to reCAPTCHA but designed with privacy in mind.
  5. Invisible CAPTCHAs: CAPTCHA that operates in the background, invisible to users but monitoring their interactions.

Method 1: Using Anti-CAPTCHA Services

Before we review some of the basic CAPTCHA solving services, I would like to recommend Bright Data’s CAPTCHA Solving tool as the superior option. The price is a bit higher, but you are getting the most complete package.

Anti-CAPTCHA services, such as 2Captcha or Anti-Captcha.com, solve CAPTCHAs on your behalf by using human workers or AI. These services provide APIs, allowing developers to pass CAPTCHA images or URLs and receive solutions in return.

  1. Installing the requests library:
  2. pip install requests
  3. Setting Up Anti-CAPTCHA: Sign up for an API key from a CAPTCHA-solving service.

Example Code:

import requests
def solve_captcha(api_key, image_url):
url = 'https://2captcha.com/in.php'
data = {
'key': api_key,
'method': 'base64',
'body': image_url, # Base64 encoded CAPTCHA image
'json': 1
}
response = requests.post(url, data=data).json()
if response['status'] == 1:
return response['request'] # Returns the CAPTCHA solution
else:
return None

Usage: Use the CAPTCHA-solving response in your request headers or form data.

Method 2: Integrating Selenium for reCAPTCHA and hCAPTCHA

For complex CAPTCHAs like reCAPTCHA and hCAPTCHA, which are challenging to bypass using requests alone, Selenium can help simulate human interaction.

  1. Install Selenium and Webdriver:
  2. pip install selenium
  3. Automating CAPTCHA Clicks:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://example.com/recaptcha-page")
# Click on the CAPTCHA checkbox
captcha_box = driver.find_element(By.ID, 'recaptcha-anchor')
captcha_box.click()

Limitations: Complex CAPTCHAs may still require anti-CAPTCHA services, even with Selenium. Check out our article of the best CAPTCHA solving tools.

Method 3: Using Machine Learning for CAPTCHA Solving (Advanced)

Machine learning models can recognize patterns in text-based and image CAPTCHAs. However, training a model requires a large dataset of labeled CAPTCHA images.

1. Tools and Libraries:

  • TensorFlow or PyTorch for model training.
  • OpenCV for image preprocessing.

2. Dataset Preparation:

  • Collect a dataset of labeled CAPTCHA images.
  • Use CAPTCHA datasets from repositories like CAPTCHAs from online providers.

3. Model Training: Train your model on labeled CAPTCHA images to recognize characters and images within the CAPTCHA.

Method 4: Cookie-Based Bypass for reCAPTCHA v3

For reCAPTCHA v3, websites monitor user behavior to detect CAPTCHA prompts. By authenticating with cookies, you may avoid CAPTCHA prompts altogether.

1. Fetch Cookies with Selenium:

cookies = driver.get_cookies()

2. Passing Cookies to requests:

session = requests.Session()
for cookie in cookies:
session.cookies.set(cookie['name'], cookie['value'])

3. Requesting with Stored Cookies: This method works best for reCAPTCHA v3, which relies on browser behavior data.

Method 5: Solving Invisible CAPTCHAs Using Human Interaction Patterns

  1. Simulate Human Actions with Selenium: Using realistic mouse movements and key presses in Selenium to mimic human interaction patterns may reduce the occurrence of CAPTCHAs.
  2. Human-Like Delay Simulation: Introducing delays in page interactions can make bot detection systems less suspicious.

Additional Tips for Effective CAPTCHA Bypass

To further reduce detection risk:

  • Use IP Rotation: Combine with proxy rotation services to avoid IP-based blocking.
  • Introduce Delays: Randomize delays between requests to appear less bot-like.
  • Session Consistency: Maintain cookies and headers across requests for a consistent browsing session.

Conclusion

Bypassing CAPTCHA in web scraping requires a combination of strategies, from using anti-CAPTCHA services and automation tools like Selenium to rotating IPs and simulating human-like behavior. These methods allow bots to navigate around CAPTCHAs effectively, but it’s essential to consider the ethical implications, as CAPTCHAs are intended to protect data and user security. Balancing these approaches with respect for site policies helps achieve scraping goals responsibly.

Got any questions? Let me know in the comments!

--

--

Data Journal
Data Journal

Written by Data Journal

Exploring the secrets of web data through scraping, collection, and proxies. Dive into the art of online data collection for growth and insight.

No responses yet