Scraping a Dynamic Website, Selenium Part-II
In part 1, I scraped doordash.com for restaurants’ menus with a different approach. Here is my second approach.
Summary of the 1st Approach
The scraper navigated to the URL(doordash.com), clicked on a food item(this opened a page of food stores in New York), clicked on each store, scraped the menus, returned to the stores’ page, and repeated this loop until it scraped 10,000–10,050 menus. If needed, it clicked the next page button.
Part-II
The 2nd Approach(This one)
This time I wanted to be able to enter a manual search string for a location with the script name, in the terminal. And, for that, I did not want to open the script to edit. When it reaches the target, it gives a message of completion, saves the data as CSV, and stops the script.
Here is my solution:
1- import the specific libraries/modules
# importing required libraries
import ctypes
import time
from typing import List
import pandas as pd
from bs4 import BeautifulSoup
from selenium.webdriver import Chrome , ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium_move_cursor.MouseActions import move_to_element_chromeimport sys
2- open the URL in a headless browser
# The URLurl = 'https://www.doordash.com/en-US'# defining driver and adding the “ — headless” argument
opts = Options()
opts.add_argument('— headless')
driver = Chrome('chromedriver')
# open the URL
driver.maximize_window()
driver.get(url)
driver.implicitly_wait(220)time.sleep(5)
3- focus the current window
# to focus the current window
driver.switch_to.window(driver.window_handles[0])time.sleep(10)
4- the following code sets the default search string for the location of the food stores
# to focus the current window
driver.switch_to.window(driver.window_handles[0])time.sleep(10)
5- the below code allows us to search food stores for a manual location.
# (Manual Search String, sys.argv will interact with the terminal, 2nd argument will is
# the search string of location
if len(sys.argv) >= 2:
search_query = sys.argv[1]
print(search_query)
We will write this location name(e.g. New York, Washington) just after the name of the python script, in the terminal.
6- enter the location name into the search input
# send location to search element
driver.find_element_by_css_selector("input[class='sc-eZXMBi ghseVd']").send_keys(search_query)
time.sleep(3)
# click on search button
driver.find_element_by_css_selector("button[class='sc-Ehqfj dtSIZX']").click()
driver.implicitly_wait(120)
7- define the lists to store the names and prices
# lists
names = []
prices = []
8- get the number of the stores on the page. This time there is only one page which contains all the stores.
time.sleep(5)
stores=driver.find_element_by_xpath ("//div[@class='sc-kRCAcj exCpGm']\
//div[@class='sc-zDqdV iEhWpB']//span[@class='sc-bdVaJa hQDtnE']")
number=stores.text
the number of stores is given in the text. So, we’ll extract the numbers from the text.
strNo=[int (s) for s in number.split () if s.isdigit ()]
for i in strNo :
i=i
print(i)
9- Now, the loop starts.
x: int
for x in range (0 , i , 1) :
here, “i” is the number of stores extracted from the string.
9- search for stores on the page (rest of the code runs under the above “for loop”)
div2=driver.find_element_by_xpath ("//div[@class='sc-jFpLkX lmxKtV'][1]")
element: List[WebElement] = div2.find_elements_by_xpath(f"//div[@class='sc-EHOje eOpoWF']\
//span[@class = 'sc-bdVaJa bTYYIJ']|//div[@class='sc-EHOje eOpoWF']\
//span[@class = 'sc-bdVaJa lncAvd']")
10- click on each store one by one, in each iteration of “for loop”, and click to open it
time.sleep (10)
WebDriverWait(driver, 60)
strnm: WebElement=element[x]
pos=strnm.location
y=pos.get ('y')-70
driver.execute_script (f"window.scrollTo(0, {y})")
print (f'{x}- ' , strnm.text)
time.sleep (4)
move_to_element_chrome (driver , strnm , display_scaling=100)
actions=ActionChains (driver)
actions.send_keys_to_element (strnm , Keys.ENTER).perform()
time.sleep (7)
driver.implicitly_wait (360)
time.sleep (10)
11- after clicking each store, set the page source in BeautifulSoup, and scrape the menus
result=driver.page_source
time.sleep (11)
soup=BeautifulSoup (result , 'html.parser')
div=soup.find ('div' , class_='sc-eAyhxF cMeCVt')
if div is not None :
time.sleep (25)
for i in div.findAll ('div' , class_="sc-bscRGj hsYyqR") :
pros=i.find ('span' , class_="sc-bdVaJa gImhEG")
print ('writing (' , pros.text , ') to disk')
names.append (pros.text)
rates=i.find ('span' , class_="sc-bdVaJa eEdxFA")
# if there is no price for the food, append 'N/A' in the list of 'prices'
if rates is not None :
print ('price: ' , rates.text)
rate=rates.text
else :
print ('N/A')
rate='N/A'
prices.append (rate)
length: int=len (names)
(In Part-I, I have explained these lines.)
12- if a scraping target is reached, save the results as CSV, give a message of completion, and exit.
# if menu record reaches the target, exit the script target completion message box
if length>150 :
# save to dataframe
df=pd.DataFrame ({'Name' : names , 'Price' : prices}) # export as csv file
df.to_csv ('doordash_menus.csv')
driver.close ()
ctypes.windll.user32.MessageBoxW (0 ,
f'Congratulations! We have successfully \
scraped {length} menus.' ,
'Project Completion' , 1)
break
sys.exit()
else :
continue
driver.back ()
continue
else :
driver.back ()
continue
# Project Succeeded
{Keep this all in a sequence.}
In the next approach, I would like to use it without BeautifulSoup and save a JSON output.
Happy Coding.