Build Simple Web Scraper Using Python & Selenium

Joonas Venäläinen
3 min readJul 25, 2019

--

Photo by Chris Ried on Unsplash

In this article i’m going to show you how to build a simple web scraper that makes request to Google by user input. It finds results from Stackoverflow. Example if i search “How to define array in Java” it opens the first result and checks if there is a accepted answer. If it finds the answer it gets the <code> block from the answer and print it to user. If no its fetches all the <code> blocks from the topic and prints them out.

First lets define all the needed imports:

Now we can define base settings:

We are writing the program inside while loop because we don’t want the program to exit after it finishes. First we are asking input from user what to search. After that we define our driver in this case i’m using Chrome as my browser. After you get it up and running you can change the browser to headless mode. This way the browser doesn’t pop up but instead does everything in the background. Then we define wait. This defines how long does the Selenium wait the element to be visible. Last we are using the driver.get() method to navigate to Google.

Now we can start locating the elements from Google. Open the Elements Panel from Chrome with Ctrl+Shift+c and click on the search field. Now we can see the element information:

We can use the name attribute to locate element in Selenium:

Now we have to pass the search word from user input to this element:

Here we are using the send_keys() method to pass the search_word variable to the input field. At the end we are also defining that we only want results from stackoverflow.com. Last we press Enter key to search.

Now for the results we are using the wait.until() method. If we didn’t have any wait it would try to run the code instantly and it would fail because the results are not finished loading:

I’m using XPATH to locate the elements. You can test your locators in the browser by opening Google results page, Developer Console (Ctrl+Shift+i) and type:

This finds all the elements from the page matching the locator and prints them in console. For this example i’m just going to open the first result and i’m using click() method for this:

Now we need to define a method to check if element is present. This is used to assert if there is accepted answer in the topic opened:

Now we can write the logic to check if there is accepted answer:

Here we are calling the elements_is_present method to check if accepted answer is present. If its there get the <code> block, else get all the <code> elements from the page. We are using .text to print out the elements text.

Lastly we need to close the browser:

It it good idea to clear the screen after new search is made. Add this after the driver.get():

If you are using UNIX:

So here it is. Simple example how you could use Selenium and get started building something with it. Here is the complete code:

--

--