Quantifying Uncertainty

Ian Wong
Open House
Published in
3 min readAug 10, 2015
It’s an Uncertain World.

Opendoor provides fair market offers to homeowners so they can sell their homes to us with confidence. In this post, we’ll discuss how understanding the prediction error curve of our algorithms has guided our valuation process for these offers.

The Challenge of Perfect Pricing

Underlying our service at Opendoor is a collection of algorithms that infer the market value of houses. These algorithms are designed with one thing in mind: to price residential real estate with total accuracy. We believe that homes, often our largest financial asset, should be valued analytically and fairly.

But modeling the prices of residential real estate is not an easy task. Some of the challenges are:

  • No two housing transactions are the same (i.e., data heterogeneity). Even transactions on the same house at distinct points in time can be very different in price.
  • The market reveals only sparse observations, which means “big data” won’t just magically rescue the modelers. We’ll have to apply sound statistics to borrow strength from the entire pool of observations.
  • Dealing with real estate data can be… difficult. Data cleaning, entity resolution, and feature engineering become important issues.

To overcome these challenges, our data scientists and valuation experts work closely together to ensure the accuracy of our offers, and to distill domain insights to make our models better.

Are we there yet?

As we improve our models, a natural question arises: To what extent have the algorithms learned the domain? And as a corollary: To what extent should we involve human oversight for a specific valuation?

Many off-the-shelf metrics are available to summarize the predictive performance of algorithms on historical transactions, e.g., R², MSE, MAE. But these metrics reduce the error distribution to a single number. In reality there is a wide range of houses on the market, and our algorithms perform differently depending on the type of housing stock. The plot below shows the hypothetical performance of our algorithms over the population of houses:

Distribution of error

Of course, our objective is zero pricing error. We want to shift the distribution all the way to the left until it’s a delta function at zero. But as we approach this goal, understanding this curve becomes very useful. For instance, it helps us identify houses that require input from human experts, both in training and in production. It also directs our focus on investigating cases where the algorithms’ predictions have high variance and could benefit most from improvement. Which homes have higher prediction error than others, and how are they related?

To be precise, suppose we have a housing model f that generates predictions f* for a home x and historical price y. Our first order concern is being accurate, i.e., f* is close to y. Additionally, we would like to understand how far off our prediction is likely to be for a given home, x. In terms of the chart above, this means inferring where along the error curve our prediction falls.

In the subsequent post, Nelson will discuss one method that we’ve productionized to quantify our uncertainty. We think these techniques are broadly useful beyond home pricing at Opendoor.

Humans and algorithms are collaborating more and more to deliver new experiences to end-users. A siloed machine learning unit is a thing of the past. As teams are embedding intelligent algorithms into their products and operations, questions like “How good is the model?”, “How should we prioritize our machine learning work?” and “When should humans intervene?” come up time and time again. We hope these posts spur discussion on how to answer these questions.

This is joint work with the data science and operations teams here at Opendoor. We’d like to also acknowledge Brad Klingenberg and Leo Pekelis for reading drafts of this post.

Originally published at labs.opendoor.com on August 10, 2015.

--

--

Ian Wong
Open House

co-founder, @opendoor. pragmatism > bayesian | frequentist.