Measuring the measurer: what’s the value of experimentation & measurement?
Experimentation & Measurement (E&M) has become increasingly popular in the past decade. Major tech organisations — Facebook, Google, Microsoft, Amazon, and more — all reported running tens of thousands of experiments every year to test their products and measure their impact quantitatively (Microsoft has a very good write-up on what they learnt during the process).
E&M capabilities come in many forms: an online controlled experiment framework (e.g. one that can run A/B tests), a team of econometrics analysts, a system capable of performing machine learning-aided causal inference, and so on.
While the existence of E&M capabilities enables one to understand the value of a proposition, the value of E&M capabilities themselves are less understood. In other words, it’s hard to measure the measurer. Inevitably this leads to questions when organisations prioritise for their upcoming goals:
Why should we invest in experimentation & measurement when we can put more money into advertising or pricing?
What value does it bring?
Is it worth more than what proposition X will bring?
Being able to quantify the benefit of E&M capabilities will no doubt help many teams and organisations to put forward a strong business case to justify much-needed investment.
Three ways of valuing E&M
Here, we can think of at least three approaches to value experimentation & measurement capabilities:
1. It helps us to recognise the value of a product / service / experience.
Similar to how software engineers write unit and integration tests to ensure the new code is working and interacting well with the existing code, running experiments allows us to be sure that a new proposition is working well with other propositions and creating value. If it’s not, we can be sure it’s this particular proposition that is causing issues, and roll it back before it causes more widespread damage.
2. It helps us to optimise what we offer.
A study by Browne and Jones pointed out there are 29 categories of changes that can make or break a customer’s experience in e-commerce, each category with many different possibilities (link to their paper). The old-school method of tweaking one thing at a time and waiting for customer feedback won’t scale, and introduces noise from the ever-changing market. On the other hand, modern E&M capabilities allow us to compare a large number of variants against each other and select the best one efficiently.
3. It helps us to better prioritise our work.
Traditional wisdom dictates that it’s difficult to make good decisions in times of uncertainty. Without E&M capabilities, we can only rely on back-of-envelope estimates or gut feel to value something before we prioritise them, which comes with a large uncertainty. E&M enables us to reduce the level of uncertainty, and subsequently make better decisions.
Quantifying the value
It’s relatively easy to quantify the value under items 1 and 2. On the other hand, the value of item 3 is less clear: What’s the value of better prioritisation in terms of the value of the things we’re working on?
Before we get to that (spoiler alert: it’s quite a bit), let’s discuss what better prioritisation actually means.
Say we’ve just created a start-up. We’ve got four projects — apple, orange, banana, and grapes — that we can work on, but only the capacity to do two of them. The sensible thing to do would be to rank these four items under a certain metric that we care about, and pick the top two projects.
In an ideal world, we’ll have an oracle that tells us the exact value of the four items. Unfortunately we don’t live in an ideal world — the exact (or true) values are usually not available, and we have to resort to estimated values with some uncertainty. We can think of it as adding some noise around the true value of the project, and sampling from the random distribution that arises when we obtain an estimate.
If the uncertainty is large, as is often the case when we guesstimate poorly, a lower (true) value item is more likely to leapfrog a higher value item and be selected by chance, as is the case for project apple on the illustration above. The estimated value of the project we then select might look good on paper, but it shouldn’t be what we care about — we should care about the combined true value of the items we are working on.
On the other hand, if the uncertainty is small, which we argue E&M capabilities help to achieve:
… we’ll see the estimated value being much closer to the true value, and hence if we rank the items by their estimated value, they will look more like list of items ranked by their true value.
A lower value item can still leapfrog a higher value one, but if an item (say project banana) has a higher value, it now has a higher chance to be selected by virtue of its own true value. This is good from a macro level, as the combined true values of the items we have picked is now higher.
The value of E&M capabilities is then the difference between the value of projects you have chosen under a lower noise (banana and grapes in this case), and the projects you have chosen under a more noisy estimate (apple and grapes). Being able to communicate the value of E&M capabilities in terms of the value of the projects enables us to quickly communicate the capabilities’ impact.
The general case
Of course, all organisations are different — not all of them will have four potential projects and a capacity to complete two. Some will make worse guesstimates to start with, and others may acquire E&M capabilities that are better. What’s needed is a general framework where different teams can input their circumstances, and get an estimated improvement in value and risk associated with acquiring E&M capabilities.
This is what we set out to do. We have formalised the above using a hierarchical model, and draw on the wealth of existing results in Order Statistics and Bayesian Inference to produce estimates. The details of the calculations won’t fit in a blog post, so if you’re interested, why not have a look at our research paper, which has recently been peer-reviewed and published in the proceedings of IEEE International Conference on Data Mining 2019?
Bryan wears two hats. He works part-time as a Machine Learning Scientist at ASOS.com, and part-time towards a PhD at Imperial College London (in the StatML CDT, jointly with Oxford). The two hats look similar though: in both jobs he is looking into how we can improve our experimentation & measurement capabilities to enable us making less biased data-informed decisions more quickly.