Zillow Data Science Interview

Zillow lists data on 110 Million homes in their living database and 6.3 Billion visits to their site in 2018 per their investor relations.

Vimarsh Karbhari
Acing AI
4 min readAug 14, 2018

--

According to Fortune, Zillow is sixth among the best places to work for in technology in 2018. There are 188 Million unique users on their Zillow Websites. Anyone who has bought or sold a home will know about their Zestimate technology which provides accurate and quick estimates to a seller or a buyer. Zillow is a Data Science Company. Data is not a leverage to the product, it is the product itself.

Photo by João Silas on Unsplash
Zillow Business: Investors Zillow Group

Zillow is a gigantic spatial database. The GIS team within Zillow works on interesting problems like spatial ETL, normalization of geospatial data and establishing geo-spatial relationships between data points. Very few companies in the world have these kind of problems to solve. The different data science teams heavily utilize python and R as their primary languages. They also use Turi (Dato- Graphlab) library which was acquired by Apple recently.

Interview Process

Interview Process starts with a Recruiter phone screen. Usually if that goes well, Zillow tests preliminary Data Science skills using their take home test. The take home test might have modelling or problems related to Zestimate. This is a very good way to test the thinking ability in Data Science problems. The take home test is followed by a technical phone screen. If you go past all these rounds, there is an onsite interview with usually three-four team members. The process is a good mix of technical and data science related skills.

Important Reading

ML at Zillow. Source: Overview of Data Science at Zillow
  1. Data Science/ML tools at Zillow: Tools Overview
  2. Zillow Recommendations Engine: Recommendations at Zillow
  3. Behind the scenes: In Dept Data Science at Zillow

AI/Data Science Related Questions

  • Implement a program to provide in real-time a list of the top 100 most viewed properties in the last hour. Properties are represented by a ‘zpid’ or Zillow property ID, which is a unique string.
  • What do you think are the factors taken into consideration to calculate the Zestimate score?
  • Describe an example of how you created a working predictive model in detail.
  • Explain overfitting and describe what steps can be taken to reduce overfitting.
  • In SVM why is there a need to maximize the margin between support vectors?
  • Calculate the median absolute error and percent of estimates within 5%, 10% and 20% of the sales price of a property.
  • Find N most frequently used 3-page sequences in a list of pages.
  • Write a program to generate Fibonacci sequence.
  • Given points on the Cartesian plane. Return the K points closest to the origin (0,0).
  • Explain Euclidean distance.
  • What are the different techniques to remove outliers?
  • Describe a way to detect anomalies in a given dataset.
  • Implement a KNN classifier
  • How do you split your data between training and validation?
  • Describe a model to analyze user’s browsing patterns on the Zillow website.

Reflecting on the Questions

As Data at Zillow is not leverage but the product itself, that emphasis is reflected in the questions above. It has a mix of programming and data science related questions. The take home test is also a very interesting way to asses fit from a Data Science Perspective. Some good practice and hard work can surely land you a job in the world’s most popular real estate site.

Subscribe to our newsletter here. We are building a new course to help people ace data science interviews. Sign up below to join the wait-list!

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about Zillow and its AI technologies helping people to get into it. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.

--

--