Hundreds of thousands of businesses run experiments. Some run tens of thousands of online randomized controlled trials e.g. A/B tests. Some leverage less formal, but much larger scale pilots of new business lines or value propositions. Inside different organizations, experimentation may be the purview of product, marketing, data, strategy/innovation, or engineering teams — or a standalone Center of Excellence itself.
Whatever the context, any business that runs experiments defines the value of that practice on a spectrum ranging from a very narrow view of potential value to a very broad one.
My bias as an expert in the space is that where an organization falls represents their maturity in experimentation. But regardless of the reason behind their mindset, or how self-aware they are of it, these differing values have massive implications; for the professionals running the experiments, the team culture that emerges from it, and the long-term impact experimentation has on the business.
Let’s briefly describe a few different points on this spectrum, from narrow to broad in their recognition:
- Only “winning” tests drive real value
- Only tests that reject the null hypothesis, i.e. clearly “win” or “lose”, create value
- All test outcomes drive real value
- All test outcomes drive real value AND the capability to conduct experiments creates strategic and competitive advantage
Only “Winning” tests drive real value
This is the most simplistic, but still most prevalent mindset. Professionals and organizations who think that only the wins matter, that no value is created unless a hypothesis ends up successfully driving growth or efficiency, are most likely to undervalue exploration in the pursuit of rapid exploitation. They become obsessed with the “win rate” of their experimentation endeavor, optimizing their behavior towards things that seem like blue chip “sure shots” and attempting to filter out unlikely winners before they reach the test stage.
There are a number of issues that arise from this approach:
- Less willingness to take risks or explore outlier odds. If value is only created by tests that win, then the cost of planning, executing, and analyzing non-winning tests is value destroyed. This framework leads teams to be risk-averse and overly selective with the “long shots” they take.
- Hence, a drive towards conformity. If we want to maximize our win rate, we will orient to ideas that are more obvious in their potential to win… and if they’re obvious to us, they’re probably obvious to the competition too. Teams in this framework will often explicitly seek to exploit “best practices” and established heuristics as a top priority, destroying the potential to create competitive advantage via differentiation.
- Win rate is unsustainable. If a business looks at win rate as the goalpost for success, over time the practice of experimentation will look less and less valuable because win rate will inevitably regress to the mean. In online experimentation, a rough industry rule-of-thumb of ~33% (with a hypothetical credible interval of 10%-45%) seems universal amongst teams who have run experiments for several years and is yet-disproven (to my knowledge, at least).
- It is empirically suboptimal. The cost/benefit ratio of business experiments, even the long-shot ideas, are so asymmetrical — and so convex — that the math clearly suggests an optimal path: try as many experiments as possible.
Only tests that reject the null hypothesis create value
The broadening here usually comes from a single main addition — the recognition of value in risk mitigation. Often times in experimentation, we’ll test an idea that had pre-existing support (or even a mandate) from the org, and find that it actually would’ve caused notable detriment to growth, efficiency, or other metrics of value to the business. Because we ran an experiment first, we’ve spotted the regrettable impact and avoided risk that we would’ve otherwise walked right into.
Beyond obvious risk mitigation, the notion that “learning is valuable” is a key mindset shift that has positive impacts to team culture too. Removing the singular emphasis on winning creates necessary trust and safety for innovation and creativity, allowing teams to try “bigger swings” without fear of real or political cost.
While still imperfect, this frame makes huge progress in solving for a number of the issues that come along with “win rate” measurements. As I’ll explain in a moment, “flat” tests that fail to reject the null hypothesis are no bogeyman, but a motivation to avoid them at least encourages the kind of risk-taking and exploration that goes beyond conformity and steers back towards more strategically valuable approaches.
All test outcomes drive real value
This evolution is more nuanced, and may require a tangible increase in experimentation capabilities to be valid.
The one type of test outcome that we’ve failed to appreciate up until this point are “flat” tests, that show no difference (i.e. fail to reject the null hypothesis). The catch, though, is that this is the most common outcome in business experimentation. Most ideas we try will fail to move the needle. Time and again, meta-analyses of online business experiments find that the median relative effect is near-zero: examples abound from A/B testing vendors, large samples of eCommerce websites, and individual businesses as well.
Even these “flat” tests can create real value if the cost of running an experiment is less than the cost of a full “launch” of the intervention in question. In some contexts, this is obvious — a pilot of a new business line is by definition smaller in scale and cost than just brazenly moving forward. With small features in digital products, this becomes much more granular, dependent on a number of DevOps and cultural considerations. (e.g. is experiment code held to a lower standard than production code since it is intended to be temporary and shown to fewer users? Are experimental treatments subject to fewer rounds of approvals and pre-launch research?)
In any case, developing the necessary processes, tools, and rules to decrease the cost of running an experiment is incredibly valuable. To go back to the implications of the payoff function in business experimentation, the optimal approach to maximizing rewards is to run as many experiments as possible, doing so by decreasing the cost per trial as much as possible. Put simply, cheap experiments are worth more than good ideas.
It’s important that moving to such a stage of maturity creates a growth loop — cheaper experiments enables more experiments, and more experiments means finding more winners. Just because flat and losing tests can create value, it doesn’t negate that winning tests are still the most valuable. But for the reasons covered earlier, fixating on win rate is not the way to find more winners, it’s getting to the stage where you can maximize experiment velocity instead.
All test outcomes drive real value AND the capability to conduct experiments creates strategic and competitive advantage
This final realization comes from a broadening of prospective outside the organization and into the marketplace. When your capacity to run experiments exceeds that of the competition (and you make more experiment-driven decisions than the competition), a number of outcomes emerge:
- You try and implement more ideas that are difficult to imitate, creating competitive advantage via differentiation (exactly the opposite effect of experimenting with a win rate fixation)
- You are gathering and validating more real insights about your market and their needs, learning things that the competition has not or will not
- You have a greater capability for agility, innovation, and navigating uncertainty than your competitors.
As Intuit founder Scott Cook writes:
“The bigger and more novel the idea, the less likely it is to survive the gauntlet of bosses who must all agree — bosses who are most comfortable with what they know and only know the past. I don’t think it’s an accident that when software companies grow large, they have until recently become less and less innovative — think Microsoft or IBM or others.
What I’ve seen at some firms I admire is something quite different. I call it decision by experiment…
I wondered why Google beat Yahoo! at search. A Yahoo! executive told me that Google succeeded by installing the system and culture to decentralize decision-making to decision by experiment. Google’s chief economist said that Google runs 3,000 to 5,000 experiments a year in search — when you use Google you’re in those experiments.”
While it’s possible for organizations to mature along this spectrum as they dip their toes into running experiments and glean these insights firsthand along the way, too many remain stuck at the very beginning, at a win rate fixation. The less-savvy corners of the “conversion rate optimization” industry certainly do the space no favors in perpetuating this. But the companies who have built their empires with experimentation in the gas tank are playing a very different game — a game of strategic advantage fueled by the realization that every experiment creates value, done correctly; and that the idea is always to run more experiments, for cheaper.