Scrape, Clean and Store Zillow Apartment Data — Part II

Store data scraped from Zillow in a BigQuery table and view.

Zach Quinn
Pipeline: Your Data Engineering Resource

--

I need your help. Take a minute to answer a 3-question survey to tell me how I can help you outside this blog. All responses receive a free gift.

Photo by Paul Szewczyk on Unsplash

Now that we’ve gotten the relevant data in part I, we can work on creating our final product: A BigQuery SQL table to be used for analysis.

Recapping Part I

The steps we’ve completed so far are:

  • Making a request to our base URL and applying a header to avoid triggering a captcha
  • Identifying the elements that contain the data we require
  • Looping through elements that contain address, price and space
  • Increasing the page count to account for all returned rows
  • Storing the output in a list of dicts
  • Converting that list to a data frame

In this part we’re going to concentrate on deep cleaning our data.

The broad steps we’ll take are:

  1. Format fields in our data frame
  2. Create a new field, “apartment_name” derived from address
  3. Load to BigQuery

--

--