Scrape, Clean and Store Zillow Apartment Data — Part II
Store data scraped from Zillow in a BigQuery table and view.
Published in
8 min readJan 17, 2023
I need your help. Take a minute to answer a 3-question survey to tell me how I can help you outside this blog. All responses receive a free gift.
Now that we’ve gotten the relevant data in part I, we can work on creating our final product: A BigQuery SQL table to be used for analysis.
Recapping Part I
The steps we’ve completed so far are:
- Making a request to our base URL and applying a header to avoid triggering a captcha
- Identifying the elements that contain the data we require
- Looping through elements that contain address, price and space
- Increasing the page count to account for all returned rows
- Storing the output in a list of dicts
- Converting that list to a data frame
In this part we’re going to concentrate on deep cleaning our data.
The broad steps we’ll take are:
- Format fields in our data frame
- Create a new field, “apartment_name” derived from address
- Load to BigQuery