Web Scraping in Python For Novice Part 2 - Using Selenium Web drivers

Vinodnethichinna
Analytics Vidhya
Published in
4 min readJun 13, 2020

In our previous article, we discussed How to do Web Scraping for begineers, if you haven't seen that then click here to check it out. I personally recommend reading my previous article in order to understand part -2 better. So now we are moving one step ahead and going to learn about how to click buttons on the webpage while scraping. The best use case for this scenario is when we are scraping news website and you all noticed we have “LOAD MORE” or “Click Here for more articles” buttons.

To Click those buttons we would need to install web drivers, in this article I preferred to use Chrome Web Drivers, you can download here. Please download the correct version as per your chrome version installed in your machine. If you are unsure of the chrome version you can find it by following the below steps.

Open Google Chrome -> Click on 3 dots on right corner->Help->About Google Chrome and you can see the version and download accordingly.

After successfully downloading Chrome Web Drivers, extract it to Windows C drive location. Create a new folder named Webdrivers and extract it to this folder. Now we need to add this path to the environmental variables. If you are not sure how to do this, please follow the below steps.

  1. Copy the chrome driver location.
  2. Click on This PC -> Properties -> Advanced System Settings -> Environmental Variables
  3. Select Path-> Edit->New and you can add the new path to list and copy the chrome driver location from step1.
  4. Click ok and you are all set to go now.
  5. Now open Command Prompt and type chromedriver and press enter.

You should be seeing the same as the screenshot above, if not something went wrong, and please follow the steps one by one.

We are going to use Jupyter Notebook and selenium Library, you can install the library by running below command in the Jupyter Notebook.

After installing selenium we need to import below required dependencies.

The below line of code will open an empty browser and controlled by automated software.

Provide the URL of the target site which we are going to use for the demonstration of clicking buttons while web scraping.

Here we pass our target site as input to the driver and after running the below command we can see our target site is opened in web browser.

Now we need to find out which button we need to click in order to fetch more records(example load more button), go to the respective button, and right-click to inspect that element to see source code, HTML, CSS, JS, and other information. Take the class name of our target button and find it by using the below code( there are various other ways apart from class name like tag name, id, etc, which would have slightly different syntax). Click methods clicks the button and by executing the below command the button gets clicked automatically and you can also see a print message. This message is just to make sure the button has been clicked and is for testing purposes.

Now we are able to click buttons on the webpage using Python. That's all for this article. I hope you all enjoyed reading this. In our next article, we will discuss how to convert data into pandas DataFrame and loading data from python into Database(i.e. Mongo DB).

Please feel to share or comment if there are any mistakes/queries.

Thank you.

#Python #WebScraping #Chromedrivers #Selenium #LearningPython#BegineersGuide

--

--

Vinodnethichinna
Analytics Vidhya

Technical, Enthusiastic and Organized Post Graduation Student with great attention to detail and analytical skills.