“What-If” Analysis

Henry M
Dandy Engineering, Product & Data Blog
5 min readMar 29, 2022

A common business request is “explain why metric <x> decreased unexpectedly this period.” This is a methodical, generalized framework to diagnose and quantify the phenomena.

Have you ever been approached with a question like: “Can you help me understand why metric <x> decreased unexpectedly?” Even if we can identify a few explanatory factors, a follow-up question may be, “How do we gauge the relative importance of effect A vs effect B?” In this article, we present a framework for answering these types of questions.

The framework encompasses the following steps, which we will make more concrete in the rest of this article:

  1. Start by asking: How large of a change was expected, if any? Sometimes, this may be framed as an “expected range of variation.”
  2. If the observed change is outside the “expected range of variation,” what are some possible explanatory factors? For example, average order value (AOV) may go down week-over-week because people are ordering more of a cheaper product type, or they are ordering cheaper product variants within product types (in the example below, individual fruits serve as our product lines and we might have organic vs. non-organic variants at different price points within each product line).
  3. To quantify the impact of a particular explanatory factor (say, customers placing more orders in cheaper product lines), simulate a “what-if” scenario as if that explanatory factor was the only thing changing, all else being equal.
  4. Do this “what-if” analysis for all the explanatory variables to measure the relative impact of each variable to the overall metric.

The (Hypothetical) Business Context

Imagine you work for Fruits-R-Us, the leading fruit distributor in the United States. Your boss, the CEO, approaches you one day and says, “Last week’s overall revenue did not increase relative to the previous week’s, yet I know for a fact that our customer base grew. What’s going on?”

The first step is to assess what the CEO’s expectations were. Clearly, he expects revenue to increase week-over-week because order counts increase week-over-week. There’s a third variable here which is the average order value (AOV = revenue / order count). For AOV, a fair assumption might be that customer ordering behavior doesn’t change from week to week, and therefore, the expected change in AOV is 0.

From there, your next response might be, “Let’s break down our revenue by product line and see if we can spot any problem fruits.” So you whip up a quick pivot table, and you get something like:

There are a few interesting things to highlight here. First, we see that revenue did decrease while the number of orders increased week-over-week. Naturally, that implies that AOV decreased (since AOV = revenue / order count).

You report your initial findings to the CEO:

Expected revenue change: > 0%
Actual revenue change: -2.8%

Expected order count change: > 0%
Actual order count change: +5%

Expected AOV change: 0%
Actual AOV change: -7.4%

The CEO responds, “That’s interesting, but why did AOV decrease?”

The astute observer will note that AOV actually increased within every product line, yet the overall average AOV decreased. This is a phenomenon called Simpson’s paradox. The simple answer is that the distribution of fruits being ordered has changed from week to week, and it’s now skewed toward the less expensive fruits. You bring this conclusion to the CEO, and he says…

“That makes a lot of sense. But how can I tell how much of the AOV decrease is explained by the change in order distribution?”

The “What-If” Framework

You go back to the drawing board, and a few thoughts come to mind.

  1. If the order distribution had stayed the same, and the new AOV’s were still achievable, then revenue would be much higher. So we could calculate a hypothetical “what-if” scenario where we hold the order distribution from Period 1 fixed while applying the AOV’s from Period 2. This would then tell us how much of an AOV increase we should have expected if the order distribution had not changed.

Let’s run the numbers!

Indeed, it does seem that revenue would have been much higher if the order distribution had remained the same. In fact, the AOV would have increased from $9.40 in Period 1 to $10.23 in Period 2 (+8.8%).

To recap the stats so far:

Expected AOV change: 0%
Actual AOV change: -7.4%
Effect from Period 2 AOV being higher: +8.8%

We can see that AOV actually decreased by 7.4%, while it should have increased by 8.8% if the order distribution had stayed the same. The logical conclusion is that the gap, -16.2%, should be explained by the negative effect of the order distribution skewing toward less expensive products.

We can verify that using another “what-if” scenario!

2. If the AOV had stayed the same, while the order distribution changed, the change in the overall AOV would tell us the net effect of the order distribution becoming less favorable.

Not surprisingly, the AOV is quite a bit lower, but now we can say precisely how much lower it could have been. AOV would have decreased from $9.40 in Period 1 to $7.95 in Period 2 (-15.4%). It’s not exactly the -16.2% we were expecting, but we can’t expect a perfect attribution with this kind of analysis (there’s some nuances with how the two effects are interdependent).

We summarize our findings to the CEO:

Expected AOV change: 0%
Actual AOV change: -7.4%
Effect from Period 2 AOV being higher: +8.8%
Effect from Period 2 order distribution being skewed toward cheaper products: -15.4%

Thus, it’s not incorrect to say that the AOV decreased -7.4% and this was due to the order distribution changing. However, the more precise explanation is that there are two countervailing effects: the effect of order distribution changing caused a -15.4% change to AOV, and that is offset by the effect of AOV increasing per product line, which caused a +8.8% change to overall AOV. The combined effect of these two effects gets us to -7.4% change in AOV from Period 1 to Period 2.

Conclusion

“What-if” analysis can be a powerful and simple tool to attribute complex and interdependent effects and isolate the most impactful ones.

At Dandy, we are actively evaluating a tool called Sisu Data that performs a very similar sort of analysis at scale. Sisu can test millions of explanatory factors and combinations of factors in a short period of time, which would save our Data and business teams time when searching for the right explanatory factors.

We hope that the technique outlined in this article will help you frame the analysis the next time you are asked to explain why a KPI moved unexpectedly!

--

--

Henry M
Dandy Engineering, Product & Data Blog

Jesus is King! Sharing Bible-based content to help you deepen your faith.