Automated Extraction of Business or Location Information from Google Maps using Python and Selenium

4 min readOct 3, 2023

1. Introduction

Google Maps is a vast repository of business information, including name, address, phone number, website, hours of operation, reviews, and ratings. This data can be valuable for a variety of purposes, such as market research, competitive analysis, and location-based marketing.

This article describes how to programmatically extract business data from Google Maps using Python and the Selenium library. Selenium is a web automation framework that allows you to control a web browser from your code.

Requirements

To run the script below, you will need the following:

Python 3
Selenium
Chrome web browser

Getting Started

Install the required Python packages:

pip install selenium
pip install webdriver_manager

For Selenium to control the Chrome browser, it requires ChromeDriver. WebDriver Manager automatically downloads and installs the correct version of ChromeDriver.

2. GoogleMapScraper: The Core Class

At the heart of our script is the GoogleMapScraper class, architectured specifically for business data extraction from Google Maps. Key attributes of this class include:

output_file_name: Specifies the name of the resultant CSV file.
headless: A flag determining the visibility of the browser during operation.
driver: A pivotal component for web page interactions.
unique_check: Ensures the uniqueness of the scraped data.

3. Web Browser Configuration

The config_driver function is integral in setting up the Chrome web browser for our tasks. The ability to run in 'headless' mode, i.e., without a visual interface, is crucial for operations where a display is redundant or for server-side applications.

class GoogleMapScraper:
    def __init__(self):
        self.output_file_name = "google_map_business_data.csv"
        self.headless = False
        self.driver = None
        self.unique_check = []

    def config_driver(self):
        options = webdriver.ChromeOptions()
        if self.headless:
            options.add_argument("--headless")
            options.add_argument('--ignore-ssl-errors=yes')
            options.add_argument('--ignore-certificate-errors')

        s = Service(ChromeDriverManager().install())
        driver = webdriver.Chrome(service=s, options=options)
        self.driver = driver

4. Dynamically Loading and Scrolling Pages

Google Maps dynamically loads business details, revealing only a subset at a time. The load_companies function overcomes this by programmatically scrolling, ensuring that the script captures every business listing until the end.

A little note here: XPath locators can be dynamic, so it is important to verify them before using them in your code.

def load_companies(self, url):
        print("Getting business info", url)
        self.driver.get(url)
        time.sleep(5)
        panel_xpath = '//body[1]/div[3]/div[8]/div[9]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/div[2]'
        scrollable_div = self.driver.find_element(By.XPATH, panel_xpath)
       
        flag = True
        i = 0
        while flag:
            print(f"Scrolling to page {i + 2}")
            self.driver.execute_script('arguments[0].scrollTop = arguments[0].scrollTop + 6500', scrollable_div)
            time.sleep(2)

            if "You've reached the end of the list." in self.driver.page_source:
                flag = False

            self.get_business_info()
            i += 1

5. Detailed Business Data Extraction

The get_business_info function is the workhorse of our script, meticulously extracting details like business name, rating, reviews, and review date. The process entails:

Awaiting the presence of elements on the page.
Iterating and sifting through each business entity.
Tapping into the aria-label attribute and text parsing to garner ratings and dates.

def get_business_info(self):
        # Add a wait time before starting to scrape the business info
        WebDriverWait(self.driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'jJc9Ad')))
        
        try:
            for business in self.driver.find_elements(By.CLASS_NAME, 'jJc9Ad'):
                name = business.find_element(By.CLASS_NAME, 'd4r55').text
                try:
                    review = business.find_element(By.CLASS_NAME, 'wiI7pd').text
                except NoSuchElementException:
                    review = "No review found"
                
              
                # Checking the stars
                try:
                    stars_element = WebDriverWait(business, 10).until(
                        EC.presence_of_element_located((By.XPATH, './/*[contains(@class, "kvMYJc")]'))
        )
        
                    full_star_images = stars_element.find_elements(By.XPATH, './/img[contains(@class, "vzX5Ic") and not(contains(@src, "rate_empty"))]')
                    empty_star_images = stars_element.find_elements(By.XPATH, './/img[contains(@src, "rate_empty")]')
        
                    print(f"Full stars count: {len(full_star_images)}")  # Tam yıldız sayısını yazdır
                    print(f"Empty stars count: {len(empty_star_images)}")  # Boş yıldız sayısını yazdır
        
                    total_stars = len(full_star_images) + len(empty_star_images)
                    rating = total_stars - len(empty_star_images)
                    print(f'Rating: {rating} stars')
                except NoSuchElementException:
                    print("Stars element not found for this business.")


                
                # Date check
                date_elements = business.find_elements(By.CLASS_NAME, 'rsqaWe')
                if date_elements:
                    date_text = date_elements[0].text
                    print(f'Date text: {date_text}')  
                    date_parts = date_text.split(", ")
                    if len(date_parts) > 1:
                        date = date_parts[1]
                    else:
                        date = date_text  
                    print(date)
                else:
                    print('Date element not found.')


                unique_id = "".join([name, str(rating), review, date])  # rating değişkenini string'e çevirin
                if unique_id not in self.unique_check:
                    data = [name, rating, review, date]
                    self.save_data(data)
                    self.unique_check.append(unique_id)

                    print(unique_id)
                    
        except NoSuchElementException as e:
            print(f"An error occurred: {e}")

6. Reliable Data Storage

Data, once extracted, is systematically stored in a CSV format via the save_data function. Beyond just storage, the function emphasizes data integrity by avoiding duplications through a unique ID mechanism for each review.

def save_data(self, data):
        header = ['ID','Client_Name','Rating','Reviews','Date']
        file_exists = os.path.isfile(self.output_file_name)
        with open(self.output_file_name, 'a', newline='', encoding="utf-8") as csvfile:
            writer = csv.writer(csvfile)
            if not file_exists:
                writer.writerow(header)
            writer.writerow([len(self.unique_check)] + data)

7. Sample Implementation

Our script’s versatility is demonstrated using Times Square as a sample. However, its flexibility is evident, as changing the URL can adapt the script to virtually any location on Google Maps.

url = "https://www.google.com/maps/place/Times+Square/@40.7579787,-73.9881175,17z/data=!4m8!3m7!1s0x89c25855c6480299:0x55194ec5a1ae072e!8m2!3d40.7579747!4d-73.9855426!9m1!1b1!16zL20vMDdxZHI?entry=ttu"


business_scraper = GoogleMapScraper()
business_scraper.config_driver()
business_scraper.load_companies(url)

8. Completing

This automated script is not just a utility; it is an enabler for businesses, researchers and data enthusiasts. Its potential is very wide, from improving feature sets to integration with advanced data processing tools. As the data-driven world expands, tools like this will become the basis for collecting information.

Thanks for reading. I hope you enjoyed it.