Image from Amazon.com
Image by Amazon.com

Find the best Airpods deal on Prime Day via web scraping step by step in 5 minutes: learn how to scrape Amazon page

Jerry Ziyuan Yan
5 min readOct 13, 2020

--

How do you find the best deal on amazon via simple web scraping? This article utilizes the Amazon product search page for Apple Airpods as a guide to show you how to scrape data from the web in less than 5 minutes.

I’m writing this piece the night before 2020, Amazon Prime day. Like everyone else, I’m anxiously waiting and praying to get a significant discount on a lot of must-haves. How do we effectively monitor the prices changes and get the most up to date information on the product you are waiting for?

Let’s build a simple Web Scraping Python Script to help us do that. (Code will be available in my GitHub repository)

Step I: Open Amazon and search for an item of interest.

Image by Amazon.com

In this case, I need to buy a new Airpods. Copy the URL from the browser

Step II: Import packages in Jupyter Notebook or other Python IDE

First, importing the Request and BeatifulScoup library in the workspace. Request library helps us request HTML data from the Internet Server. BeatifulSoup is a powerful library that enables us to clean and better locate specific items in the HTML pull.

Image by Author

I am also importing Pandas and NumPy for data manipulation.

Image by Author

Then copy the URL from the browser, paste it into requests.get() method. This will pull the HTML data from the Amazon.com Web Server.

If you wonder what does HTML data looks like, you can print it using r.text

Very messy data. We need to use the BeatifulSoup library to remove some tags. Let’s initiate a BeatifulSoup object in the code below.

Image by Author

Step III: Inspect the page to find all relevant data tags on the webpage

Image by Amazon.com

Use Ctrl + Shift+ I to inspect the title of any product on the page.

Image by Amazon.com

The highlighter will help you find the <div class = ‘….’ >. Copy the class name and paste it in the soup.find_all() method. This method will find all product data on the page.

Image by Author

You can use the prettify() method to view a more structured code: Here, I’m looking at the second item of the page using slicer [1].

Image by Author

Next, let’s scrape the discount price and other data.

Here I would like to scrape the discounted price. The highlighter shows it below to the tag:

<span class=”a-offscreen”>$124.00</span>

Image by Amazon.com

All we need to do is copy the class name to the select_one() method. We can print out the text use the code below.

Image by Author

We do this for all the field of interests: Product Name, Discount Price, Market Price, Rating, and Number of Reviews

Step IV: Collect price and other data of ALL product listing on the page

Finally, We can iterate through all the listing products on the page using a simple For loop.

Image by Author

All this is doing is going through each listing and grab the information that we are interested in.

Step V: Put everything together and find the best deal

In the end, I would like to create a Pandas DataFrame to clean and visualize our data. Also, I would like to put the data in its correct format and handle any null values. Finally, we can find the best deal with the most generous discount.

Discount = Market Price — Current Discount Price

Image by Author

Here, I am doing some data engineering to create a new discount column and clean up the data. Finally, I sort the data based on the discount amount

Here is the final result:

Image by Author

So which one is the best deal based on the discount?

We can see that Airpods with a wireless charging case currently have the highest discount of $52.8. The second best deal is Airpod Pro, with a discount of $50.

Part VI: Conclusion

In this article, we looked at using BeatifulSoup and Request Libary to scrape the Amazo.com for Airpods.

  1. We Opened the URL of interest
  2. We import packages in the Jupyter notebook
  3. Then, we inspect the page to find all relevant data tags on the webpage
  4. After that, we collect price and other data of ALL product listing on the page
  5. Finally, we work on data engineering and look at the best deal based on a discount. The top 2 deals are Airpods with a wireless charging case and AirPods Pro.

Code and more detailed analysis can be found at my Github repository: link.

--

--

Jerry Ziyuan Yan

BI Lead at The Home Depot | Data Scientist |Duke University|Aspiring Analytical Manager|