Automated Extraction of Business or Location Information from Google Maps using Python and Selenium

Humeyraboluk
4 min readOct 3, 2023

--

1. Introduction

Google Maps is a vast repository of business information, including name, address, phone number, website, hours of operation, reviews, and ratings. This data can be valuable for a variety of purposes, such as market research, competitive analysis, and location-based marketing.

This article describes how to programmatically extract business data from Google Maps using Python and the Selenium library. Selenium is a web automation framework that allows you to control a web browser from your code.

Requirements

To run the script below, you will need the following:

  • Python 3
  • Selenium
  • Chrome web browser

Getting Started

Install the required Python packages:

pip install selenium
pip install webdriver_manager

For Selenium to control the Chrome browser, it requires ChromeDriver. WebDriver Manager automatically downloads and installs the correct version of ChromeDriver.

2. GoogleMapScraper: The Core Class

At the heart of our script is the GoogleMapScraper class, architectured specifically for business data extraction from Google Maps. Key attributes of this class include:

  • output_file_name: Specifies the name of the resultant CSV file.
  • headless: A flag determining the visibility of the browser during operation.
  • driver: A pivotal component for web page interactions.
  • unique_check: Ensures the uniqueness of the scraped data.

3. Web Browser Configuration

The config_driver function is integral in setting up the Chrome web browser for our tasks. The ability to run in 'headless' mode, i.e., without a visual interface, is crucial for operations where a display is redundant or for server-side applications.

class GoogleMapScraper:
def __init__(self):
self.output_file_name = "google_map_business_data.csv"
self.headless = False
self.driver = None
self.unique_check = []

def config_driver(self):
options = webdriver.ChromeOptions()
if self.headless:
options.add_argument("--headless")
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')

s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=options)
self.driver = driver

4. Dynamically Loading and Scrolling Pages

Google Maps dynamically loads business details, revealing only a subset at a time. The load_companies function overcomes this by programmatically scrolling, ensuring that the script captures every business listing until the end.

A little note here: XPath locators can be dynamic, so it is important to verify them before using them in your code.

def load_companies(self, url):
print("Getting business info", url)
self.driver.get(url)
time.sleep(5)
panel_xpath = '//body[1]/div[3]/div[8]/div[9]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/div[2]'
scrollable_div = self.driver.find_element(By.XPATH, panel_xpath)

flag = True
i = 0
while flag:
print(f"Scrolling to page {i + 2}")
self.driver.execute_script('arguments[0].scrollTop = arguments[0].scrollTop + 6500', scrollable_div)
time.sleep(2)

if "You've reached the end of the list." in self.driver.page_source:
flag = False

self.get_business_info()
i += 1

5. Detailed Business Data Extraction

The get_business_info function is the workhorse of our script, meticulously extracting details like business name, rating, reviews, and review date. The process entails:

  • Awaiting the presence of elements on the page.
  • Iterating and sifting through each business entity.
  • Tapping into the aria-label attribute and text parsing to garner ratings and dates.
def get_business_info(self):
# Add a wait time before starting to scrape the business info
WebDriverWait(self.driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'jJc9Ad')))

try:
for business in self.driver.find_elements(By.CLASS_NAME, 'jJc9Ad'):
name = business.find_element(By.CLASS_NAME, 'd4r55').text
try:
review = business.find_element(By.CLASS_NAME, 'wiI7pd').text
except NoSuchElementException:
review = "No review found"


# Checking the stars
try:
stars_element = WebDriverWait(business, 10).until(
EC.presence_of_element_located((By.XPATH, './/*[contains(@class, "kvMYJc")]'))
)

full_star_images = stars_element.find_elements(By.XPATH, './/img[contains(@class, "vzX5Ic") and not(contains(@src, "rate_empty"))]')
empty_star_images = stars_element.find_elements(By.XPATH, './/img[contains(@src, "rate_empty")]')

print(f"Full stars count: {len(full_star_images)}") # Tam yıldız sayısını yazdır
print(f"Empty stars count: {len(empty_star_images)}") # Boş yıldız sayısını yazdır

total_stars = len(full_star_images) + len(empty_star_images)
rating = total_stars - len(empty_star_images)
print(f'Rating: {rating} stars')
except NoSuchElementException:
print("Stars element not found for this business.")



# Date check
date_elements = business.find_elements(By.CLASS_NAME, 'rsqaWe')
if date_elements:
date_text = date_elements[0].text
print(f'Date text: {date_text}')
date_parts = date_text.split(", ")
if len(date_parts) > 1:
date = date_parts[1]
else:
date = date_text
print(date)
else:
print('Date element not found.')


unique_id = "".join([name, str(rating), review, date]) # rating değişkenini string'e çevirin
if unique_id not in self.unique_check:
data = [name, rating, review, date]
self.save_data(data)
self.unique_check.append(unique_id)

print(unique_id)

except NoSuchElementException as e:
print(f"An error occurred: {e}")

6. Reliable Data Storage

Data, once extracted, is systematically stored in a CSV format via the save_data function. Beyond just storage, the function emphasizes data integrity by avoiding duplications through a unique ID mechanism for each review.

def save_data(self, data):
header = ['ID','Client_Name','Rating','Reviews','Date']
file_exists = os.path.isfile(self.output_file_name)
with open(self.output_file_name, 'a', newline='', encoding="utf-8") as csvfile:
writer = csv.writer(csvfile)
if not file_exists:
writer.writerow(header)
writer.writerow([len(self.unique_check)] + data)

7. Sample Implementation

Our script’s versatility is demonstrated using Times Square as a sample. However, its flexibility is evident, as changing the URL can adapt the script to virtually any location on Google Maps.

url = "https://www.google.com/maps/place/Times+Square/@40.7579787,-73.9881175,17z/data=!4m8!3m7!1s0x89c25855c6480299:0x55194ec5a1ae072e!8m2!3d40.7579747!4d-73.9855426!9m1!1b1!16zL20vMDdxZHI?entry=ttu"


business_scraper = GoogleMapScraper()
business_scraper.config_driver()
business_scraper.load_companies(url)

8. Completing

This automated script is not just a utility; it is an enabler for businesses, researchers and data enthusiasts. Its potential is very wide, from improving feature sets to integration with advanced data processing tools. As the data-driven world expands, tools like this will become the basis for collecting information.

Thanks for reading. I hope you enjoyed it.

Photo by Markus Winkler on Unsplash

--

--