Understanding how busy roads affect home values

Opendoor
Opendoor
Aug 15, 2019 · 7 min read

By: Zain Shah & Zach Gottesman

Originally published at Opendoor.com:

At Opendoor, we strive to make moving easier, providing sellers with more certainty through competitive, all-cash offers. We’re able to do this using a combination of human and machine intelligence. Specifically, we supplement data-driven home value predictions with local market knowledge from real estate experts.

Any realtor will tell you that one of the most significant factors to consider when estimating a home’s value — if not the most significant — is its location. Some simple examples of external location factors include the amenities of the neighborhood and walkability. For us, one of the most important, and surprisingly difficult to compute factors, is whether a home is exposed to a busy road (and subsequently, exposed to road noise) [1, 2, 3].

Here’s how we went about automatically including this information in our models, using both local real estate knowledge and data.

The challenge

In the past, our valuation model often disagreed with the offers made by our real estate experts on homes affected by busy roads. We found that our human valuators and our model made busy road adjustments very differently. In fact, busy road adjustments accounted for nearly 11% of our major valuation discrepancies between our model and local experts. With the prevalence and magnitude of our automated valuation misses, we were determined to engineer a smarter busy road feature and enhance our existing road data.

Previous Approaches

Opendoor uses a freely available collaborative dataset known as OpenStreetMap for our road geometries. You can think of OpenStreetMap (OSM) as something like the Wikipedia for road data. OSM is fantastic for open source mapping projects and contains particularly comprehensive road geometry data, which makes it great for geospatial operations.

Unfortunately, the data from OSM is not designed to meaningfully describe traffic volume. The best we can do with the OSM vector data alone is guess the traffic volume of a road based on its road type. In other words, if the road is marked as a “motorway” (OSM-speak for large highway) then it should indicate high traffic. If a road is marked as a “residential road” that should indicate low traffic volume.

These OSM road labels are only intended to distinguish how a road should be rendered on a map. The labels do not necessarily represent traffic volume. We don’t want to attribute incorrect traffic densities to roads because it could inaccurately skew our valuation model’s prediction on a customer’s home.

Generally, our market operators agree with the busy road adjustments inferred from OSM when evaluating roads classified as freeways and highways. However, that agreement breaks down when we examine the adjustments our model makes for homes on arterial roads, connector routes, and main streets in neighborhood subdivisions. The latter road types’ traffic patterns are much more nuanced and more difficult to get from publicly available data.

Pitfalls of Other Datasets

We investigated several alternative traffic data sets including data from the Department of Transportation, traffic data from private companies, and noise data from the Bureau of Transportation. To our dismay, we found that those datasets didn’t contain the exact data we needed.

Our use case demands high coverage for all sorts of roads, even small roads in residential subdivisions. What we need is a dataset granular enough to describe the small road networks people live within, which is where the majority of our mis-valuations occurred.

Be the dataset you wish to see in the world

Opendoor sits on a treasure chest of internal data about home values and the adjustments made when comparing a home to other comps. If you’re not familiar with making adjustments for comparable home, see our guide on how Opendoor calculates home values.

Not only do we have local real estate experts deliberating on thousands of home values in each market, but we also have direct data about market dynamics from being one of the largest home sellers in the markets we operate in.

The offers we’ve made on our customers’ homes contain valuable information from our experts on which roads affect home values the most, but this data is restricted to home level information, and therefore difficult to generalize to new offers.

However, with a trusted dataset of road geometries, we can actually generalize this info to offers on new homes we’ve never seen before, so long as we can figure out which road is responsible for a given adjustment. This way we won’t repeat the same mistake for a new home, so long as it’s on a road we previously made an offer on. Even better, this data would only improve over time as our offers dataset grows.

We need adjacency

To build this dataset, we need to better understand which road a home’s adjustment was made for, as well as which homes should be affected by a given road. In other words, we don’t actually know which road is responsible for a given home selling $10K less than it should, so we need to infer this information.

In the past, we concluded that a home is affected by a busy road only by its distance to the nearest busy road (+ a cutoff). The distance + cutoff approach does not work well for dense or sparse collections of homes.

For instance, if home A is close to a big highway, but home B is sandwiched between home A and the highway, we could still possibly make a busy road adjustment for home A when we should not have. Just the same, we miss out when a home is next to the highway but beyond our cutoff distance. With such an imprecise association between roads and homes, we dilute this valuable information by spreading it across all the roads nearby.

The more accurately we can attribute a certain valuation difference to a specific road, the more valuable our dataset will be. At first, it may seem reasonable to look at the street name on a home’s address, or any road touching the lot the house sits on. Unfortunately, that wouldn’t account for cases where a highway runs alongside a home’s backyard. There may be a field in between, but the traffic could still be an ear-and-eye-sore.

To this end, we settled on defining the association as one of adjacency–only homes “adjacent” to a road are affected by it. For this, we need a robust association which can account for any roads that the home might be adjacent in any direction.

Voronoi Diagrams

We determine adjacency using something called a Voronoi Diagram. A Voronoi diagram is a partition of a given space based on a set of points using an algorithm called Delaunay triangulation. The algorithm effectively divides the space into regions for each point so that everything in a given region is closer to its point than any of the other points.

This fulfills our requirements nicely because we want to know whether a home is closer to a given road than anything else, irrespective of a specific distance cutoff. If home A is closer to a busy road than home B, then its Voronoi region will be touching the road, while home B’s Voronoi region will not.

Zooming Out

Our adjacency calculation is performed by collecting all the road geometry coordinates and home coordinates into a set of points for a given market. We then generate a Voronoi diagram from the set of all points (road and home coordinates).

From this Voronoi diagram, we look at all the home points and compute whether their Voronoi region touches a Voronoi region of a road point. If a home region touches a road region, then that home is said to be adjacent to that road, because the region of space closer to that road than anything else is also closer to the home than anything else.

At a high level, we generate our dataset by doing the following:

  1. For a given OSM road geometry, we find homes adjacent to the road.
  2. We then look for previous human adjustments on any of those adjacent homes
  3. From the previous human adjustments, we compute the average road adjustment value
  4. We label the OSM road geometry with the average road adjustment value (which we can then use for future offers on any homes adjacent to that specific road geometry).

When Data Improves, the Customer Wins

So, how did this new busy road data concoction affect the performance of our valuation model? As mentioned before, this dataset grows over time as we collect more human adjustment data for any given market. Keeping that in mind, it should not be surprising we found this data is most beneficial for the markets we’ve operated in the longest, such as Phoenix and Dallas.

With a quick competitive offer on any eligible home, Opendoor can empower anyone with the freedom to move. Knowing how homes near high traffic roads perform on the market allows us to operate more efficiently and reach more customers, which at the end of the day is what matters most.

Interested in using real estate data to build more transparency into the biggest financial purchase of a lifetime? Join our team or learn more about the challenges we’re solving.

Open House

The Opendoor Engineering and Data Science Blog

Opendoor

Written by

Opendoor

Open House

The Opendoor Engineering and Data Science Blog

More From Medium

More on Real Estate from Open House

More on Real Estate from Open House

Five of our favorite talks from RailsConf 2019

60

More on Data Science from Open House

More on Data Science from Open House

Taming missing features at serving time

103

Also tagged Data Visualization

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade