A Case Study of Simulation Based Inference

Serena Zhang
Open House
Published in
6 min readFeb 21, 2020

By: Serena Zhang

Opendoor faces many critical decisions throughout the buying and selling process. How do we value a home? How much should we charge as a fee? What should we set as the listing price when reselling a home?

One common tool to approach these decisions is A/B testing. However, long feedback loops and the low number of real estate transactions make it extremely hard for Opendoor to iterate fast. For example, to determine the business impact of a recent improvement to our Hazard model, we would need to wait months for homes to be resold to see results.

Another tool is observational analysis, where one derives statistics directly from historical data. However, not only does this approach poorly distinguish correlation vs causation, many of the business questions are new territory (e.g. a new pricing algorithm) and thus lack prior data from which to make inferences.

On the spectrum of tradeoffs between cost and accuracy lies another tool — simulation-based inference.

From left to right in the following graph, the listed tools enable increasingly accurate results while incurring a higher cost. For example, A/B tests give the highest confidence in results but take longer to conduct and have a higher risk of financial impact on the business.

As a fast growing startup, we need to validate our ideas fast and make smart business decisions. To this end, a tool like simulation offers a nice, balanced tradeoff. In this blog post, we will talk about an example of simulation based inference at Opendoor, where we applied a simulation framework to opportunity size our cost prediction model — an essential component in our fee-setting process.

A Case Study

Opendoor provides sellers with certainty and a streamlined transaction process. To cover our costs, we need to charge a fee. To achieve our goal of serving the most customers, we aim to set the lowest fee possible for each home while still maintaining our margin goal.

A key component to setting fees is the cost prediction model, where we predict future costs (e.g. utility bills, cleaning cost) for each home based on information such as square footage. Suppose the team proposes investing in a more accurate cost prediction model. Intuitively, this proposal sounds appealing, but in order to justify investing in the proposal, we need to estimate how much business value it would drive.

How do we evaluate success?

We simulate transactions in order to determine our overall margin and volume. We start by generating n transactions where each has a true cost cᵢ. Next, we assign a predicted cost for each transaction generated by our current model (let’s call it Model 1). If we set fee = predicted cost, the margin we get from each transaction is simply marginᵢ = feeᵢ - true costᵢ. Suppose we have a conversion function Z, with the probability of customer i to sell to Opendoor as Z(i)=pᵢ, the expected margin of the iᵗʰ transaction is E(xᵢ)= xᵢ * pᵢ.

To get the overall margin and volume of n transactions:

def cal_margin_volume(fee, costs, conversion_probabilities):""":param fee: a function of cost prediction:param costs: array of true costs:param conversion_probabilities: array of probabilities that each customer will sell"""volume = conversion_probabilities.sum()margin = (conversion_probabilities * (fee-costs)).sum() / volumereturn volume, margin

Similarly, we can simulate m₂ and v₂ by using the predicted cost from the new model (let’s call it Model 2).

We define a margin and volume space where each position corresponds to a specific set of margin and volume values and plot (m₁, v₁) as point 1 and (m₂, v₂) as point 2.

We see that point 2 has higher margin while point 1 brings higher volume. Suppose both exceed the minimal margin threshold — which is better for our business?

As it turns out, there are 2 effects that drive the differences between point 1 and point 2:

A. The inherent negative relationship between margin and volume

B. The true margin and volume gain from a more accurate model

To understand effect A, let’s take a look at Opendoor’s fee economics. The higher the fee, the less risk Opendoor takes on when reselling, thus the higher the margin. However, the higher fee results in a lower likelihood that a seller will accept. This results in the inherent negative relationship between margin and volume. For example, one can argue that model 2 can also achieve v₁, if we just reduce fee for everyone.

Effect B is what we truly want to measure. If we keep volume constant, do we get additional margin? How much? As a result, we need to distinguish effect A from B. We can achieve this by simulation yet again, by mimicking how Opendoor conducts margin and growth tradeoffs in reality: when Opendoor decides to incentivize growth, it subsidizes fees, reducing fees across all transactions by a fixed amount K.

By adding or subtracting a constant K (let’s call this constant a “trade-off knob”) from fees, and repeating the simulation various times, we obtain a set of margin and volume values (mᵢ, vᵢ). By connecting these values in the margin and volume space, we have simulated a “margin vs volume” trade-off curve.

Left: The simulated trade-off curve of model 1. Right: Model 2 moves the trade-off curve to the right

Note that we have only changed the “trade-off knob” when simulating these points without changing the underlying fee-setting model. If we switch to the new model and repeat this process, we get a new curve. In fact, each model corresponds to a unique margin vs volume trade-off curve, which could be viewed as our efficiency frontier (or more formally, a form of production possibility frontier (ppf)).

In this case study, the new model pushes the efficiency frontier further right, where we can trade off margin and volume more efficiently. We can also estimate the additional margin gain keeping volume constant (or vice versa) by comparing the two frontiers, which ultimately makes deciding on modeling investments much easier.

A General Recipe for a Simulation Framework

As every business problem has its nuances, we’d like to generalize the above case study. Below we summarize 3 essential components for a simulation based inference framework:

  • Data generation process: This represents what we think the data looks like in the real world. Often times, we can leverage historical data.
  • Policy: This defines the rules of the algorithm or simply the model output.
  • User Model: This is our best guess of user activity as a function of policy. For example, a conversion model that predicts the probability a user will convert given certain features.

Some potential use cases in different business contexts:

Simulation based inference is a useful tool and has benefited Opendoor in various areas such as model opportunity sizing, repair scoping strategies, and better understanding the housing market. If you are interested in Opendoor, head to our careers page to learn more!

Special Thanks

Special thanks to Nelson Ray and Chris Said for their comments and contributions!

--

--