Scraping Best Buy Stores Location

So, what is the easiest way to get CSV file of all the Best Buy store locations data in the USA?

Buy the Best Buy store data from our data store

There are over 950 Best Buy stores in USA and you buy the CSV file containing address, city, zip, latitude, longitude of each location in our data store for $50.

Figure 1: Best Buy store locations. Source: Best Buy Store Locations dataset

If you are instead interested in scraping for locations on your own than continue reading rest of the article.

Scraping Best Buy stores locator using Python

We will keep things simple for now and try to web scrape Best Buy store locations for only one zipcode.

Python is great for web scraping and we will be using a library called Selenium to extract Best Buy store locator’s raw html source for zipcode 94536 (Fremont, CA area).

### Using Selenium to extract Best Buy store locator's raw html source from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import Select import time from bs4 import BeautifulSoup import numpy as np import pandas as pd test_url = 'https://www.bestbuy.com/site/store-locator' option = webdriver.ChromeOptions() option.add_argument("--incognito") chromedriver = r'chromedriver_path' browser = webdriver.Chrome(chromedriver, options=option) browser.get(test_url) time.sleep(5) html_source = browser.page_source browser.close()

Using BeautifulSoup to extract Best Buy store details

Once we have the raw html source, we should use a Python library called BeautifulSoup for parsing the raw html files.

Figure 2: Inspecting the source of Best buy store locator.

  • We will extract store names.
  • The data will still require some cleaning to extract out store names, address line 1,address line 2, city, state, zipcode and phone numbers but thats just basic Python string manipulation and we will leave that as an exercise to the reader.
# extracting Best Buy store names soup=BeautifulSoup(html_source, "html.parser") store_name_list_src = soup.find_all('h2', {'class','location-card-heading c-section-title heading-6 v-fw-medium '}) store_name_list = [] for val in store_name_list_src: try: store_name_list.append(val.find('button').get_text()) except: pass #Output ['Union City', 'Milpitas', 'Mountain View', 'Dublin', 'El Camino Real', 'San Carlos', 'Santana Row', 'The Plant', 'Blossom Hill', 'Emeryville', 'Pleasant Hill', 'Colma', 'San Francisco (13th and Harrison St.)', 'Slatten Ranch', 'Tracy']

The next step is extracting addresses. Referring back to the inspect in the chrome browser, we see that each address text is in fact of the class name c-address so we just use the BeautifulSoup find_all method to extract that into a list.

# extracting best buy addresses addresses_src = soup.find_all('span',{'class', 'loc-address'}) addresses_src address_list = [] for val in addresses_src: address_list.append(val.get_text()) address_list[:10] # Output ['31350 Courthouse DrUnion City, CA 94587', '63 Ranch DrMilpitas, CA 95035', '2460 E Charleston RdMountain View, CA 94043', '4820 Dublin BlvdDublin, CA 94568', '715 E El Camino RealMountain View, CA 94040', '1127 Industrial RdSan Carlos, CA 94070', '3090 Stevens Creek BlvdSan Jose, CA 95128', '181 Curtner AveSan Jose, CA 95125', '5065 Almaden ExpySan Jose, CA 95118', '3700 Mandela PkwyOakland, CA 94608']

We will extract Phone number of each store using a similar approach.

phone_number_list = [] for val in phone_number_src: phone_number_list.append(val.get_text()) phone_number_list # Output ['Phone Number510-441-2130', 'Phone Number408-942-0201', 'Phone Number650-903-0591', 'Phone Number925-829-7041', 'Phone Number408-738-8680', 'Phone Number650-622-0050', 'Phone Number408-241-6040', 'Phone Number408-297-0701', 'Phone Number408-979-1591', 'Phone Number510-420-0323', 'Phone Number925-988-0256', 'Phone Number650-756-8711', 'Phone Number415-626-9682', 'Phone Number925-513-4995', 'Phone Number209-832-2166']

Converting into CSV file

You can take the lists above, and read it as a pandas DataFrame.

import pandas as pd df = pd.DataFrame({"store_name":store_name_list,"address":address_list, "phone_number":phone_number_list}) df.to_csv("best_buy_store_locations.csv") df.head() #Output address phone_number store_name 0 31350 Courthouse DrUnion City, CA 94587 Phone Number510-441-2130 Union City 1 63 Ranch DrMilpitas, CA 95035 Phone Number408-942-0201 Milpitas 2 2460 E Charleston RdMountain View, CA 94043 Phone Number650-903-0591 Mountain View 3 4820 Dublin BlvdDublin, CA 94568 Phone Number925-829-7041 Dublin 4 715 E El Camino RealMountain View, CA 94040 Phone Number408-738-8680 El Camino Real

Scaling up to a full crawler for extracting all Best Buy store locations in USA

  • Once you have the above scraper that can extract data for one zipcode/city, you will have to iterate through all the US zip codes.
  • it depends on how much coverage you want, but for a national chain like Best Buy you are looking at running the above function 100,000 times or more to ensure that no region is left out.
  • Once you scale up to make thousands of requests, the bestbuy.com servers will start blocking your IP address outright or you will be flagged and will start getting CAPTCHA.
  • To make it more likely to successfully fetch data for all USA, you will have to implement:
  • rotating proxy IP addresses preferably using residential proxies.
  • rotate user agents
  • Use an external CAPTCHA solving service like 2captcha or anticaptcha.com

After you follow all the steps above, you will realize that our pricing($50) for web scraped store locations data for all Best Buy stores locations is one of the most competitive in the market.

Originally published at https://www.specrom.com.

We cover all the cutting edge natural language processing, machine learning and AI powered strategies to extract web data on big data scale.

Recommended from Medium

Journey to becoming a Google Certified Android Developer/ ALC Experience

32-Bit Buffer Overflow

Extending Cloudify With Custom Intrinsics

Open Source at Indeed: Sponsoring the Open Source Initiative

Open Source Initiative logo

Reuse test containers in parallel integration tests

Get Twitter API Keys and Tokens

The Future of Bring Your Own Identity

KubeCon NA 2019: Top Ten Takeaways (Part 1)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jay M. Patel

Jay M. Patel

Cofounder/principal data scientist at Specrom Analytics (specrom.com) natural language processing and web crawling/scraping expert. Personal site: JayMPatel.com

More from Medium

How Web Scraping is Used in Apple Music Streaming Data Analysis?

Extracting Data From WhatsApp

Configuring VestaCP for Standalone Web Applications

Twitter Scraping with Tweepy