Improving the Zestimate: An Experiment

Published in

Be The Fox

7 min readDec 11, 2018

Update 11/11/21: This article was written originally in December of 2018 and, given the recent shutdown of Zillow’s home buying efforts, I felt it was important and relevant to revisit.

The team at ForxyAI conducted this experiment, which resulted in improving the accuracy of predicted home values in 60% of the homes tested, to highlight the importance and value of incorporating the current condition of the home into an Automated Valuation Model (AVM). After all, how can you possibly value a home without knowing if it’s been well maintained or if it’s a complete disaster?

In fact, the first sentence under “Disadvantages” on the Wikipedia page for AVM says, “The disadvantages are that they do not take into account the property condition, as a physical inspection of the property does not occur and therefore the valuation produced assumes an average condition which may not reflect current reality.”

This is why we started FoxyAI. To unlock the condition of a home using AI and computer vision. It’s imperative that when photos are available, the property condition is included in the valuation model, and the only way to do this at scale, across millions of properties, is to use AI.

Whether you use a 3rd party AVM or a custom in-house model, contact us to learn more about how our computer vision can improve the accuracy of your valuations.

FoxyAI set out to improve residential real estate valuations by incorporating image data. To do this, FoxyAI research developed FoxyNet, the Convolutional Neural Network that powers house2vec. House2vec takes a raw image and returns an image feature vector, embedded in a high dimensional space, containing information on the quality and condition of the property for use in valuation models, among many other applications.

We decided to pair house2vec with arguably the most famous and divisive state of the art valuation model, Zillow’s Zestimate. The results of several experiments will be discussed in a multi-part series.

Step 1: Collect Data

To begin our research, we first collected Zestimates, list prices, and photos for properties listed for sale in Massachusetts. Once we had a robust set of data, we began training a new model. We built this new model by combining the house2vec image feature vector with the Zestimate to produce a new predicted sale price in order to infer whether our embedding space of condition and quality could improve an existing model’s output.

Step 2: Wait

The next step was to wait for these properties to sell. Zillow is notoriously slow at updating the status of their listings. After waiting three months, approximately 1000 properties were listed as sold on Zillow. At this point, we compared the predicted sale price produced months earlier to the Zestimate and the properties actual sale price.

Overall Results

We broke out our results into brackets of low-end, mid-range, and high-end properties as follows:

Low end: <$175k

Mid-range: $175k — $750k

High end: >$750k

In the low-end bracket, the house2vec + Zestimate model predicted sales price outperformed Zillow’s Zestimate 60% of the time.
In both the mid-range and high-end brackets, the house2vec + Zestimate model predicted sales price outperformed Zillow’s Zestimate 40% of the time.

Let’s dive into some specific examples of where FoxyAI outperformed Zillow and where Zillow outperformed FoxyAI.

We learn the most from studying our failures so let’s first look at an example where the Zestimate outperformed our combined model, and attempt to understand why we weren't able to improve the prediction.

In the following example, Zillow beat our combined model and managed an impressive prediction within 3.5% of the sale price. Foxy’s combined model was 43.5% off of the sales price. Let’s dig into this listing and try to figure out what happened. (Note: Zillow has since updated their Zestimate to incorporate the new sales price.)

Zillow’s Zestimate: $106,265

Foxy + Zestimate Predicted Price: $194,832.92

Sale Price: $110,000

ZID: https://www.zillow.com/homedetails/56156106_zpid/

**Zillow’s Zestimate: $106,265** …. **Foxy + Zestimate Predicted Price: $194,832.92 …. Sale Price: $110,000** Source: Zillow

Upon initial inspection of the listing, we can see there are a substantial number of individual property level features listed, as well as a decent set of property photos. Also, this property is a condo located in a professionally managed community. The community features a pool, tennis courts, a clubhouse, and professionally landscaped grounds. The image set includes pictures of these luxury amenities.

We believe these community amenity photos throw off our combined model. This case is not very common, but sometimes listings include images outside of the actual property being sold. If we show a human a set of photos that contained a house, a large pool, as well as an additional structure on the property, without knowing these features are actually part of a larger private community, it is reasonable to assume that he or she would guess a higher property value. We believe this is exactly what happened here.

In our second example, FoxyAI outperformed the Zestimate. Foxy’s combined model was 3.87% off of the final sales price vs. Zillow’s 18.01%. An impressive improvement over the Zestimate alone. (Note: Zillow has since updated their Zestimate to incorporate the new sales price.)

Zillow’s Zestimate: $201,246

Foxy + Zestimate Predicted Price: $171,635.20

Sale price: $165,000

ZID: https://www.zillow.com/homedetails/56209192_zpid/

**Foxy + Zestimate Predicted Price: $171,635.20** .… **Zillow’s Zestimate: $201,246 …. Sale Price: $165,000** Source: Zillow

You can see from the listing that we have a robust set of photos, and according to the listing description, the property was recently renovated. So why did we outperform the Zestimate and why did we predict a price lower than the Zestimate?

The listing says the property was recently renovated and it appears to be in good condition. As mentioned previously, our network is trained to understand both quality and condition.

Upon inspection of the photos, one could argue that the kitchen and bathroom floor tile are “builder grade” quality. Similarly, the kitchen countertops, although granite, appear to be Ubatuba, which falls in the low end on the granite price spectrum and would also be considered “builder grade”. There is little landscaping to speak of, and while the interior features on Zillow list hardwood floors, we can clearly see there is, in fact, no hardwood at all.

FoxyAI picks up on these subtleties and factors them into its predicted price.

Real estate pricing is inherently a subjective matter. What I think is nice granite, you might consider ugly, but at the end of the day, the value is ultimately what a buyer is willing to pay. The FoxyNet is a CNN trained on millions of property photos. The beauty of a deep neural network is that it “remembers” these nuances.

When compared to the first example which has a substantial number of features listed, this listing has relatively few property-level features. Aside from some basic information like square feet, the number of bedrooms, lot size, etc., there is little data for the Zestimate to factor into its price estimate.

We believe that the quality and condition data provided by house2vec, is ultimately why FoxyAI’s combined model outperforms the Zestimate alone. House2vec can especially help in listings missing categorical data and provides that final frontier of missing information the Zestimate doesn’t seem to take into account.

Both of the preceding examples were of properties in the low-end bracket, where FoxyAI performed the best.

What we learned from the low-end bracket applies just the same to the other pricing brackets. However, we found that listings in these pricing brackets were more likely to have a robust set of property level features, which is why we outperformed the Zestimate only 40% of the time.

Research Limitations

Let’s discuss the limitations of this research, ideas for future research and how what we learned is relevant to property price prediction.

There are several limitations to this research. The first is that the sample size is small. Today’s machine learning models need data and lots of it. After splitting the data into training and test sets, we have even fewer than 1,000 listings on which to train our downstream model.

The second limitation is that we are working with relatively few model features. Zillow’s valuation model likely takes into account hundreds if not thousands of features, but it is all boiled down into the Zestimate. That compression causes loss of information.

Editors Note: This article was originally published in December 2018 and has been updated.

Improving the Zestimate: An Experiment

Step 1: Collect Data

Step 2: Wait

Overall Results

Research Limitations

Written by Vin Vomero