SELENIUM is a free and open-source automated testing framework used to validate web applications across different browsers and platforms but is also used as a scraping tool like scrapy and beautiful soup.
In this article, we will be looking at how we can use Selenium to scrape all the Location Data from Google Maps using the URL of the location.
So without further adieu let’s get started…
Quick Note: Any data collected from websites can be subject to copyright, thus meaning that should not be reused without owner consent and that should not be definitely used for commercial purposes. The main objective of this article is to show how to collect data as a coding exercise, as well as a way to build datasets for research and/or personal projects.
1. Location Data
- Average Rating
- Total Reviews
- Address of Location
- Phone Number
- Website URL
- Open and Close Time for each Day
- Busy Percentage for each hour of the Day
2. Reviews Data
- Reviewers Name
- Reviewed Date
- Reviewed Text
- Reviewed Rating
Let’s Get Started
Step 0: Bird’s Eye View
Let’s first get the visual view of the functioning and the steps that we are going to code,
- Reach onto the Location Page on Maps and get the location data,
- Get Open and Close Time for each day,
- Get the busy percentage data for each day,
- Click the “More Reviews” button and go onto all reviews page of the location,
- Click the “more” button for large reviews to load them completely on the website.
- Finally, scrape the Reviews Details from the website
Step 1: Installation
- Create and activate your python virtual environment and install Selenium using,
$ pip install selenium
- Now download the Google Chrome WebDriver, which is basically a piece of software that automatically runs the browser instance and over which Selenium will work.
- #Note: Download the same version as your Chrome Browser.
- Add the downloaded .exe file to your current working folder.
Step 2: Import Essential Libraries
- Create a new “main.py” file into your working directory and import these essential libraries,
Step 3: Create a main class and initialize
- Let’s first initialize our main scraping class that will contain all the upcoming functions
- The __init__ function is the constructor that will automatically get called and initialize these necessary parameters.
Step 4: Get Location Data
- Now we will create a function that will scrape the location data like Average Rating, Total Reviews, Address, Phone Number, Website URL.
- The self.driver.find_element_by_… are the Selenium functions that automatically find out the HTML elements with that class name or id name and stores them into a variable, and later we can use the text() function over those variables to get the respective values.
Step 5: Get Open & Close Times
- Now we’ll create two functions one that clicks the open & close time button and then another that gets all the days and their respective open and close time.
- The class “lo7U087hsMA__row-header” contains all the days and “lo7U087hsMA__row-interval” contains the respective open and close times.
Step 6: Get Busy Percentage for each day
- The class that will get all the days and for each day finds out the busy percentage for each hour in a day of the location.
- The variable “a” is a list of all the days, then we loop through “a” and find out all the times available in that day and store it into list “b”, then loop in b and find out the busy percentage for that respective hour in a day and store it in our final data list.
Step 7: Click the all reviews button
- Now that we have scraped all the location data, it’s time to move onto the all reviews page where we will scrape the reviews data.
- To do that we will find the All reviews button on the HTML and use the selenium .click() function to click it and get redirected to that page.
- The selenium WebDriverWait function basically tells selenium to wait until that element gets loaded into the Html.
Step 8: Load all reviews
- Now that the selenium is on all reviews page, before any scraping we have to load all the reviews by scrolling down to the bottom as this page like most other modern websites is implemented using AJAX which means the rest of the reviews will only be loaded into HTML when you scroll down to look at them.
- Let’s create a scroll page function that will first scroll and load all the reviews before we further proceed to scrape reviews.
- The above code will scroll the page 5 times, which means it first brings the scroll bar to the bottom, waits for new reviews to load, and then again scroll it to the bottom.
Step 9: Expand long reviews
- To see the long reviews we have to click the more button under each review to make it load into the Html.
- So let’s create a expand all reviews function that will find all these more buttons on the already loaded page and clicks them to load the entire reviews.
- The element is the list of all those buttons present on the loaded page.
Step 10: Scrape Reviews Data
- Now that everything is been loaded we will create a function that scrapes the reviews data like each reviewer name, text, posted date, and rating.
- Now that we have all our functions created let’s just create the main function that will simply call them one by one and execute the entire scraping process.
- You can run your code from the terminal using — “python main.py”
- And with this, we have written our entire Python-Selenium script that just using the Location URL can scrape its entire data from Google Maps.
- Feel free to ask your doubts and queries regarding this article in the comments section.
- Connect with me on LinkedIn.