Web Scraping with Selenium

Neslihan Avsar
3 min readDec 12, 2022

--

Web-Scraping is basically data scraping to extract data from websites. It is a term given to perform several tasks programmatically in a short time. Data scraping is an important skill for data scientists, as the internet is an endless resource for data.

While we can use BeautifulSoup and requests modules to scrape websites written using HTML and CSS coding languages, these methods are not very useful for extracting data from websites written with Javascript coding language.

In this blog we are going to use Selenium for web-scraping. Selenium is extremely useful for web mining to extract useful data from websites.

Selenium is one of the Python libraries and webdriver. Selenium automates the browser using the codes we have written and simulate all possible interactions with the page. Here’s a step-by-step guide on how to use Selenium with the basic example being extracting web elements to get more information about this topic from the website https://www.selenium.dev/documentation/webdriver/elements.

Install and Imports

pip install selenium
import selenium
import time
from selenium import webdriverfrom selenium.webdriver.common.keys import Keys

Install WebDriver

We need the web driver so that it can automatically open the browser to access the website of our choice. In this example we will use Google Chrome browser. Selenium does also support Internet Explorer, Firefox, Safari, and Opera.

After the web driver is downloaded, we copy its path as follows. Right click to get path and click properties from here and you can find the exact location of the file.

chrome_driver_path = r"C:\Users\username\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)

Access the website with Python

driver.get("https://www.selenium.dev/documentation/webdriver/elements")

Find the specific location

Let’s access the information we want to find.

we need to find the location of the item to extract the information we want to retrieve. Google inspect allows us to access this information. To find the location of the element’s we are looking for click right and, select inspect as shown below.

Accessing elements from Selenium webpage

In inspect section, we can see the element ‘File Upload’ tag as follows:

<div class=‘‘section-index”>

<a href=‘‘/documentation/webdriver/elements/file_upload/”>File Upload</a>

To locate this document object model (DOM), use ‘by class name’ code with Selenium.

element = driver.find_element(By.CLASS_NAME, "section-index")

To get all the elements information in p section, we can use ‘by tag name’ code with Selenium.

In inspect section:

<p>Ways to identify one or more specific elements in the DOM.</p>

Note that there is no information in p section for this example.

elements = element.find_elements(By.TAG_NAME, 'p')
for e in elements:
print(e.text)

You can make the web page wait as long as you want with the code below before closing it; 5 seconds for example:

time.sleep(5)

Type driver.close() to close the web page.

driver.close()

The final code is as follows.

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

chrome_driver_path = r"C:\Users\username\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://www.selenium.dev/documentation/webdriver/elements/")

element = driver.find_element(By.CLASS_NAME, "section-index")
elements = element.find_elements(By.TAG_NAME, 'p')

for e in elements:
print(e.text)

time.sleep(5)


driver.close()

--

--