Accurately Valuing Homes with Deep Learning and Structural Inductive Biases

Stu (Michael) Stewart
Open House
Published in
9 min readSep 8, 2020

Stu (Michael) Stewart

A schematic-style geometric re-interpretation of a neural network for valuing homes.
If Mondrian had invented Backpropagation.

Setting the Stage

At Opendoor, we buy and sell thousands of homes every year. We make offers to purchase even more homes — hundreds of thousands annually. To generate valuations for such a quantity of houses we need a system that combines human and machine intelligence. Enter AVMs.

AVMs

AVM, or automated valuation model, is the name given to a Machine Learning model that estimates the value of a property, usually by comparing that property to similar nearby properties that have recently sold (comparables or comps). Comps are key: an AVM evaluates a property relative to its comps, assimilating those data into a single number quantifying the property’s value.

The idea of using a model to predict home prices is hardly new. (Boston Housing Dataset, anyone?) So Opendoor’s use of an AVM — we have lovingly dubbed ours OVM — probably isn’t unusual. What might be unusual, though, is OVM’s centrality to our business. Many companies have an “AI strategy,” but at Opendoor (scare quotes) “AI” is the business. We don’t buy or sell a single home without consulting OVM — for a valuation, for information regarding comparable properties, or for both. Humans and models must work hand-in-hand for the business to flourish.

OVM has existed in some form for all but a few months of Opendoor’s history. Recently, we launched our latest-and-greatest model, which generated a step-function improvement in our predictive accuracy by melding human intuition about home valuations with deep learning algorithms. But in order to understand why this is significant, a brief history lesson is in order.

Our Previous Work

For the past several years, OVM has relied on a pipeline-style ML system in which separate models handle different aspects of the home valuation process. The old system looked a bit like this:

OVM (handcrafted model pipeline)

  1. Select comps
  2. Score said comps based on their “closeness” to the subject property
  3. “Weight” each comp relative to the others
  4. “Adjust” the (observed) prices of each comp
  5. Estimate “uncertainty” via multiple additional models
Multistage pipeline with component models for each individual step of the prediction process.
Handcrafted OVM, our previous model.

While the system reflects a natural human process for valuing homes (a desirable property), there are a few downsides to this approach. Namely, having so many separate models increases complexity and also prevents us from jointly optimizing things like mean and variance predictions, or comp weights and comp adjustments.

In short, the prior model effectively leveraged human intuition about home valuations, but was very complex and did not adequately share information between the various problems it sought to solve. In addition, it did not handle high-cardinality categorical information (such as postal codes) in a very thoughtful manner.

The Promise of Deep Learning

The deep learning revolution has been propelled forward by a battery of factors; few stand more prominently than the magic of Stochastic Gradient Descent (SGD). For our purposes, the ability to define an end-to-end system, in which gradients flow freely through all aspects of the valuation process, is key: for instance, attempting to assign weights to comps without also considering the requisite adjustments (a shortcoming of the prior algorithm) leaves useful information on the table. This shortcoming was top-of-mind as we built out our current framework.

The new system, written in PyTorch, looks a bit like this:

OVM (deep learning)

  1. Select comps
  2. Give everything to a neural net and hope it works!
Neural network based OVM, which takes in subject and comp data, and produces a price estimate, using only one model.
DeepOVM produces a price (and price-variance) estimate using only a single model.

Deep learning is often pilloried for an alleged overabundance of complexity. For our purposes, however, deep learning presented a much more straight-forward solution. But we must ask: What are the characteristics of this new system that allow it to retain the positives from our earlier models while also addressing their shortcomings?

Trust the Process

In residential real estate, we know the causal mechanism by which homes are valued, a powerful backstop typically unavailable in computer vision or Natural Language Processing (NLP) applications. That is, a home is priced by a human real estate agent who consults comps (those same comps again) and defines/adjusts the listing’s price based on:

  • the recency of the comps (which factors in home price fluctuations)
  • how fancy/new/desirable the comps are relative to the subject property

List price is not the only input to the close price, to be sure, but it is by far the most important one. Comp prices are more than correlative: a nearby comp selling for more than its intrinsic value literally causes one’s home to be worth more, as no shopper will be able to perfectly parse the underlying “intrinsic value” from the “noise (error)” of previous home sales. This overvaluation propagates causally into the future prices of other nearby homes.

A model with an inductive bias that mirrors this data-generating process is well positioned to succeed as an AVM.

The Measure of a Map

Our old system did a good job synthesizing the aforementioned human intuition about real estate. Put differently, its inductive bias is well suited to the problem at hand given what we know about the data-generating process. It was less effective, however, at leveraging important categorical information about a home and its comps (such as postal code, county, etc.) that are less well-treated by older ML algorithms. It also failed to leverage data in an end-to-end manner, thereby unnecessarily restricting information flow between components of the system.

Let’s investigate the former weakness first: How should we structure our neural network to utilize categorical embeddings while not losing sight of the known data-generating process?

Embeddings’s the word

Deep learning, through categorical feature embeddings, unlocks an extremely powerful treatment of high-cardinality categorical variables that is ill-approximated by ML stalwarts like linear models and Gradient Boosted Trees. The benefit of these embeddings is on full display in the NLP community, where embeddings have revolutionized the field.

Real estate has surprising similarities to NLP: high-cardinality features such as postal code, census block, school district, etc. are nearly as central to home valuations as words are to language. By providing access to best-in-class handling of categorical features, a deep learning based solution immediately resolved a primary flaw of our system. Better yet, we didn’t need to lift a finger, as embedding layers are a provided building-block in all modern deep learning frameworks.

Engineering a Network

The final unresolved defect of our prior algorithm was its inability to jointly optimize parameters across sub-model boundaries. For instance, the model that assigned comp weights did not “talk” to the model that predicted the dollar-value of the comp-adjustment that would bring said comp to parity with the subject listing.

A modular framework, such as PyTorch, cleanly resolves this fault, as well. We can define sub-modules of our network to tackle the adjustment and weighting schemes for each comp, and autograd will handle backward-pass gradient propagation within and between the sub-modules of the net.

Yet, we must keep in mind a key constraint: The inductive bias of our network should hew closely to the causal pricing mechanism or else the human-interpretability of the algorithm will be compromised.

There are several approaches to model this process (while enabling joint-optimization). We’ve had success with many model paradigms presently popular in NLP and/or image retrieval / visual search. These include:

  1. Transformer-style network architectures that accept a variable-length sequence of feature vectors (perhaps words or houses) and emit a sequence or single number quantifying an output
  2. Siamese Networks that compare, for example, images or home listings and produce a number/vector quantifying the similarity between any two of them
  3. Triplet loss frameworks for similarity detection (and, more recently, contrastive-loss approaches spiritually similar to triplet loss)
  4. Embedding lookup schemes such as Locality Sensitive Hashing that efficiently search a vector-space for similar entities to a query-vector of interest

The process of valuing a home is similar to NLP for one key reason: a home “lives” in a neighborhood just as a word “lives” in a sentence. Using the local context to understand a word works well; it is intuitive that a comparable method could succeed in real estate.

Image retrieval hinges on querying a massive database for images similar to the query image — a process quite aligned with the comparable-listing selection process.

Which model works best will depend on the specifics of the issue one is trying to solve. Building a world-class AVM involves geographical nuance as well: an ensemble of models stratified by region and/or urban/suburban/exurban localization may leverage many or all of the above methodologies.

The Problem at Hand

With our network(s) we must be able to answer two key questions:

  1. How much more (or less) expensive is some comp listing than the listing of interest?
  2. How much weight (if any) should be assigned to said comp, relative to the other comps?
A magnifying glass.

Zooming In

Let’s make our problem more concrete: assume for the sake of argument that, after evaluating one’s problem, a transformer appears to be well suited to the project specifications.

We can define a module, then, that takes data from both the listing of interest (the subject listing) and from the comps selected for said listing.

The module might take in tabular data (about the listing’s features), photos, satellite imagery, text, etc. It may also use contextual information, including information about the other comps available for the given listing of interest — a transformer’s self-attention aligns well with this notion of contextual info. Said module is responsible for outputting two kinds of quantities:

  1. An estimate of the relative price difference between a given (subject, comp) listing pair
  2. A “logit” (un-normalized weight) characterizing the relative strength of a comp

Because the comp weights should sum to one, a normalization scheme (perhaps softmax, sparsemax, or a learned reduction-function) is employed after the weights are computed. Recall that the comparable properties have already recently sold (never mind active listings for now), so their close prices are known. That close price, augmented by the price delta computed in (1), is itself a powerful predictor of the close price of the subject listing.

These transformer-based techniques from NLP work well because each comp can be viewed as a draw from a relatively homogeneous bag of possible comparable properties. In this capacity, comps are quite similar to words in the context of a language model: atomic units that together form a “sentence” that describes the subject listing and speculates regarding its worth.

Though, deciding which words (comps) to place in that sentence is a tricky problem in its own right.

Capping the Pipe

Once the aforementioned quantities are in hand, the valuation process reduces immediately to a standard regression problem:

  1. The observed comp close prices are adjusted via the values proposed by our network
  2. These adjusted close prices are reduced, via a weighted-average-like procedure, to a point estimate of the subject’s close price
  3. Your favorite regression loss can then be employed, as usual, to train the model and learn the parameters of the network

Reaping the Benefits

We measure the accuracy of our models on many cohorts and subsets of listings. Across all of them, the neural network based ensemble architecture outperformed heritage OVM. We were extremely pleased to see (relative) error rates decline by 10% or more in many cases. We also massively reduced the number of models that we have to train, host, and maintain in production.

Summary

At Opendoor, we’ve updated our core home valuation algorithm to incorporate advances in deep learning without sacrificing domain-specific knowledge about the mechanism by which residential properties are valued. We saw a step function improvement in accuracy after implementing these ideas; the bulk of the improvement can be attributed to (1) end-to-end learning and (2) efficient embeddings of high-cardinality categorical features.

If you are interested in building the next generation of machine learning applications for real estate, we’d love to hear from you. Opendoor is hiring (remotely!) as well as in our SF, LA, and Atlanta offices.

And What of Photos?

A home, viewed from the road, with a “sold to opendoor” sign in the yard.

Selling is finance, buying is romance.
~ Opendoor Mantra

Humans interact (and fall in love) with homes through photos. It seems natural, then, that a deep learning model would leverage these images when comparing homes to one another during the appraisal process. After all, one of the great success stories of deep learning is the field of computer vision. Transitioning OVM to deep learning has the added benefit of making it much easier to incorporate mixed-media data, such as images and text (from listings, tax documents, etc.) into our core algorithm. But that, dear reader, is a topic for another blog post.

Until next time, may your GANs never suffer mode-collapse!

--

--