Python Guide to Scraping Google Search Results

4 min readFeb 6, 2024

Google, the dominant search engine, is a goldmine of valuable data. However, extracting Google search results automatically and on a large scale can be challenging. This guide delves into these challenges, ways to overcome them, and techniques for efficiently scraping Google search results.

Understanding Google’s SERP

The term “SERP” (Search Engine Results Page) often comes up in discussions about scraping Google. It refers to the page displayed after a search query is entered. Unlike the simple link lists of the past, modern SERPs are rich with various elements to enhance the user experience. Key elements include featured snippets, paid ads, video carousels, ‘People also ask’ sections, local packs, and related searches.

Legality of Scraping Google Results

The legality of extracting data from Google SERPs is a hot topic. Generally, scraping publicly available internet data, including Google SERP information, is legal. However, legalities can vary, so it’s advisable to seek specific legal counsel.

Challenges in Scraping Google Search Results

Google uses several methods to prevent unauthorized data harvesting, making it hard to differentiate between harmful bots and harmless scrapers. Common obstacles include:

CAPTCHAs: Google uses CAPTCHAs to distinguish bots from humans, which can lead to IP blocking if failed.
IP Blocks: Scraping activities can trigger IP bans if a website identifies them as suspicious.
Unstructured Data: For effective analysis, scraped data must be well-organized.

Oxylabs’ Google Search API is designed to navigate these challenges, providing structured Google search data without the hassle of scraper maintenance.

Essential Python Libraries for Scraping Google Search Results

To commence this tutorial on extracting data from Google searches using Python, ensure you have the following prerequisites:

Access credentials for Oxylabs’ SERP API
Python installation
The Requests library

Begin by registering for Oxylabs’ Google Search Results API to obtain your username and password, which will be crucial throughout this tutorial. Next, install Python (version 3.8 or later) from the official Python website. Finally, you’ll need the Requests library, renowned for its ability to handle HTTP requests effortlessly. Install it using the following command:

For macOS/Linux:

$python3 -m pip install requests

For Windows:

d:\amazon>python -m pip install requests

2. Constructing the Payload and Executing a POST Request
Create a new Python script and input the following code:

import requests
from pprint import pprint

payload = {
    'source': 'google',
    'url': 'https://www.google.com/search?hl=en&q=newton'  # Example search for 'newton'
}

response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('USERNAME', 'PASSWORD'),
    json=payload,
)

pprint(response.json())

This script will result in a response similar to:

{
    "results": [
        {
            "content": "<!doctype html><html>...</html>",
            "created_at": "YYYY-DD-MM HH:MM:SS",
            "updated_at": "YYYY-DD-MM HH:MM:SS",
            "page": 1,
            "url": "https://www.google.com/search?hl=en&q=newton",
            "job_id": "1234567890123456789",
            "status_code": 200
        }
    ]
}

Notice how the URL in the payload represents a Google search results page for ‘newton’.

3. Customizing Query Parameters

The payload dictionary can be tailored to suit your scraping needs. For instance:

payload = {
    'source': 'google',
    'url': 'https://www.google.com/search?hl=en&q=newton'
}

Here, ‘source’ is a key parameter, defaulting to Google, and the URL specifies the Google search page. There are various other parameters available, such as ‘google_ads’, ‘google_hotels’, and more, detailed in the Oxylabs documentation.

When configuring the payload, remember that using ‘google_search’ as the source disallows the use of the URL parameter. However, you can utilize multiple parameters for different data types without needing multiple URLs.

To refine your results further, you can add parameters like ‘domain’, ‘geo_location’, and ‘locale’ to your payload. For example:

payload = {
    'source': 'google_search',
    'query': 'newton',
    'domain': 'de',
    'geo_location': 'Germany',
    'locale': 'en-us'
}

This configuration fetches American English results from google.de, as seen in Germany. You can also control the volume of results using ‘start_page’, ‘pages’, and ‘limit’ parameters. For instance, to fetch results from pages 11 and 12, with 20 results per page:

payload = {
    'start_page': 11,
    'pages': 2,
    'limit': 20,
    ...  # Additional parameters
}

4. Final Python Script for Scraping Google Search Data

Combining all elements, here’s a complete script example:

import requests
from pprint import pprint

payload = {
    'source': 'google_search',
    'query': 'shoes',
    'domain': 'de',
    'geo_location': 'Germany',
    'locale': 'en-us',
    'parse': True,
    'start_page': 1,
    'pages': 5,
    'limit': 10,
}

response = requests.request(
    'POST',
    'https://realtime.oxylabs.io/v1/queries',
    auth=('USERNAME', 'PASSWORD'),
    json=payload,
)

if response.status_code != 200:
    print("Error - ", response.json())
    exit(-1)

pprint(response.json())

5. Exporting Data to CSV

Oxylabs’ Google Scraper API can convert HTML pages into JSON, eliminating the need for BeautifulSoup or similar libraries. For example:

payload = {
    'source': 'google_search',
    'query': 'adidas',
    'parse': True,
}

The results are returned in JSON format, which can be effectively normalized using the Pandas library:

import pandas as pd

data = response.json()
df = pd.json_normalize(data['results'])
df.to_csv('export.csv', index=False)

6. Error and Exception Handling

When encountering issues like network problems or invalid parameters, use try-except blocks:

try:
    response = requests.request(
        'POST',
        'https://realtime.oxylabs.io/v1/queries',
        auth=('USERNAME', 'PASSWORD'),
        json=payload,
    )
except requests.exceptions.RequestException as e:
    print("Error:", e)

if response.status_code != 200:
    print("Error - ", response.json())

Conclusion

This guide aims to equip you with the knowledge to scrape Google search results using Python effectively. For further assistance or inquiries, Oxylabs’ support team is available via email or live chat.

Python Guide to Scraping Google Search Results

Written by George Andrew