Scraping Google’s Search Engine With Python — A Step-by-Step Tutorial

Provision your free Custom Search Engine on Google Cloud Platform

Wouter van Heeswijk, PhD
CodeX

--

Photo by Marten Newhall on Unsplash

In principle, accessing a website via Python is not that hard. You import the requests module, define the url you want to access, and simply pass a HTTP request. For google.com, that might look something like:

import requestssession = requests.Session()
url = 'https://www.google.com/search?q=Will+it+rain+today+in+Amsterdam'
result = session.get(url)
output = result.text

You could try, but will quickly find that the ‘Accept cookies’ button is in the way and you won’t get any meaningful results. In fact, Google — as well as many other websites — deliberately sets up roadblocks to prevent automated abuse of their search engine. Some workarounds circulate on the web, like clicking the cookie button with selenium or manually importing your cookie settings. However, there’s a reason to stay away from such practices, and that is the responsible use principle. Web scraping can easily overload a server with large numbers of automated requests — although Google’s servers can probably take the hit, with 3.5 billion searches daily — and circumvents the ads that bring in money. Thus, we should always check the API documentation and ensure the traffic load…

--

--

Wouter van Heeswijk, PhD
CodeX
Writer for

Assistant professor in Financial Engineering and Operations Research. Writing about reinforcement learning, optimization problems, and data science.