To Get More Impact From A/B Testing, Listen To Nassim Taleb
Why win rates don’t matter, velocity is everything, and the strategy is all made up
Applications to business experimentation (or A/B testing) from a 2012 article by Nassim Nicholas Taleb entitled “Understanding is a Poor Substitute for Convexity”. Adapted from a series of LinkedIn posts published on my personal profile in 2021.
More people running experimentation programs or A/B testing need to listen to Nassim Nicholas Taleb — the ever-controversial but best-selling author of books like “The Black Swan”, “Antifragile”, and “Fooled By Randomness”.
Professor Taleb has written some very helpful pointers on maximizing the ROI of experimentation — they are his “SEVEN RULES OF ANTIFRAGILITY IN RESEARCH”.
And although the mathematical language and terms borrowed from options theory may be foreign to the marketing and product leaders who run A/B tests day-to-day, I think I can help “translate” them to be very clear and specific to our field.
Taleb wrote these seven rules in a 2012 article for Edge.org entitled “Understanding is a Poor Substitute for Convexity (Antifragility)”.
(It may be helpful to read the original article first, or reference it alongside my commentary here.)
In the article, he argues that advancements in technology, or “opaque” fields like medicine, are neither the result of luck, nor of careful scientific planning and direction. (I.E., directing our efforts using some “grand narrative” to explain why an idea is worth researching — in our context, testing — doesn’t actually drive outcomes.)
Have I lost you already? To re-state: The return on investment of our testing programs is neither the result of “chance”, nor of having some meticulously-researched roadmap.
(I can smell the controversy brewing… mmmmm, controversy)
Taleb defies common wisdom that success in experimentation is all about conducting thorough research to determine where best to focus.
In such developments as the industrial revolution (and more generally outside linear domains such as physics), there is very little historical evidence for the contribution of fundamental research compared to that of tinkering by hobbyists.
Instead, the best way we can plan our roadmaps and execute is through leveraging a mathematical property called “convexity”. Every A/B testing program is inherently convex.
Let’s look at exactly what that means.
The magic of experimentation lies in its payoff function: when you win — you win big. When you lose, you lose very little.
(That is, the gains and losses are asymmetrical.)
This is the mathematical property of convexity that Nassim Nicholas Taleb says governs the benefits of research (in our case, of running A/B tests or similar randomized, controlled experiments). It’s a seemingly simple realization that actually has huge consequences on what our strategy in experimentation should be.
Let’s think for a second about what the potential downsides of experimentation actually are. If you build and launch a test that loses, what’s the cost? There’s the actual cost of building the experiment, maybe we lost out on a few conversions during the course of the test, and there’s the time and data we “spent” on this idea instead of on the next one.
Those first two are extremely cheap, all things considered. Which leaves us with the following fascinating little nugget: the biggest cost in experimentation is the opportunity cost of not running the next idea. (Put that one in your back pocket, as we’ll revisit it later.)
Convexity also means that informal “tinkering”, or trial and error, can lead to long term gains — and in complex systems, such tinkering will vastly outperform careful scientific planning and direction over time. Convex systems are what the author defines as “antifragile” — they benefit from uncertainty and disorder.
How can you allow for more uncertainty in your testing roadmap?
A/B Testing with Fat Tails
(see: Azevedo et al 2019)
Your “win rate” in A/B testing is almost meaningless.
The problem is that not all “wins” are the same size. And in a testing program, almost all of the positive impact comes from a select few outliers — the really big wins.
You’re probably familiar with the 80/20 rule, or with “The Long Tail”. They are examples of “power laws”. In A/B testing, they can get even more extreme.
At Cro Metrics, we ran the numbers and were honestly pretty shocked to see that 90.7% of the positive impact we’ve driven for clients came from just 5% of the experiments we’ve run. Bing’s distribution is even more skewed — some 75% of their revenue impact from testing came from just 2% of their experiments.
This is a “fat tailed” distribution — the few “Black Swan” outliers that are the really big wins skew the entire distribution. And crucially, those Black Swans are, by definition, nearly impossible to predict.
Once you understand this distribution of outcomes in A/B testing, it’s pretty apparent why trying to increase win rate or carefully engineer your roadmap with fancy predictions is an expensive and unsuccessful exercise.
Along with convexity, it’s one of the mathematical principles underpinning Taleb’s rules for research.
Decrease Your Cost per Experiment
Cheap experiments are worth more than good ideas.
That’s the subtitle of Michael Schrage’s 2014 book “The Innovator’s Hypothesis”, as well as a perfect way to clearly summarize Taleb’s first “rule of antifragility in research”.
I also love the way Schrage puts it on page 39 of his book: “If your organization would rather run twenty comprehensive experiments next month than two simple experiments tomorrow, you’re destined for innovation dysfunction.”
Taleb’s first rule is that “convexity is easier to attain than knowledge”. That is, we stand to gain much more from decreasing our cost per experiment than from trying to predict/prioritize “better” ideas. The former is feasible and inexpensive, the latter is expensive — if you even believe it to be possible in the first place.
Convexity, we recall, is the shape of the payoff function of experimentation — our downside is very limited, our upside is nearly unlimited. And in the reality of a fat tail distribution, where few outliers make up almost all positive impact, improving that downside is far easier and more profitable than trying to predict where the needles are in the haystack.
If your program conducts extensive pre-experiment research, examine the value of doing so. Can you prove a correlation between that research and actual experiment outcomes? Is the correlation large enough, and the cost of said research low enough (in dollars, and in time), that conducting it is more optimal than allowing more ideas through to the experiment stage?
That’s the two-word answer to an optimal experimentation strategy. Velocity first.
More than any other factor, maximizing the number of experiments you’re able to run will determine the magnitude of impact/ROI you’re able to drive.
Taleb explains why in his 2nd rule of “antifragility in research” — I’ll quote him directly here:
“Following point (1) and reducing the costs per attempt, compensate by multiplying the number of trials and allocating 1/N of the potential investment across N investments, and make N as large as possible… A large exposure to a single trial has lower expected return than a portfolio of small trials.”
(Keep in mind the fat-tailed distribution of experiment outcomes — the overwhelming majority of our impact comes from a few unpredictable outliers.)
Velocity is also a great way to measure your program health. Unlike other metrics like win rate, velocity is within your direct control, dependent on the strength of your processes, experiment design, and culture of experimentation.
Figuring out how to run more experiments faster is a top priority of the tech giants/FAANG companies, and you can regularly find advancements published in any number of scholarly journals.
One great example, since it’s been open-sourced, is Facebook’s ML adaptive experimentation platform Ax (ax.dev, short for Adaptive Experimentation). I highly recommend listening to Eytan Bakshy’s talk at F8 on the topic.
Process Beats Ideas
Which would you rather have — good ideas, or a good process?
(Personally, I’ll take the latter every time. Even the best idea dies with poor execution.)
There’s a common wisdom in venture capital to invest in people, not in business plans. What looks good on paper can fall through at any moment. A smart team who can opportunistically adapt, quickly try multiple ideas and approaches, and validate the right one is far more likely to succeed than any singular idea.
That’s the illustration Nassim Taleb aptly uses to talk about two winning principles in experimentation — “serial optionality”, and “nonnarrative research”. The former essentially says to keep your roadmaps very flexible, and very short-term. The latter explains that the arc of your testing program is far more likely to look like random outcomes than any sort of grand narrative.
So much of what passes for “advice” in CRO is focused on pre-experiment research, UX best practices, roadmapping and prioritization exercises — in other words, focused on enabling good ideas. Shouldn’t we all be talking much more about enabling good process? That’s change management, cross-org leadership, getting alignment from the C-suite on down.
Because we’re in the business of hunting for outlier successes, teams who put aside working on ideas for working on their process (i.e. building a mature culture of experimentation) will reap rewards that are disproportionate to everyone else.
(Some Parting Thoughts)
Of course, I’m being pretty absolutist here. But I do so because the “party line” in the field is quite absolutist in the other direction:
“If you’re running an experiment without a research-backed hypothesis, you’re wasting your time, and maybe worse!”
I don’t have anything against qualitative research methods or other approaches to informing ideation and prioritization. I do have something against them when they become dogma that slows down the actual pace of experimentation. Allocate your time and budget wisely.
Success in experimentation will fundamentally depend on your ability to do two things:
- Run more experiments,
- Decrease your cost-per experiment.
This is not to say that experiment quality doesn’t matter — that you don’t need statistical rigor and sufficient resourcing devoted to each test. Accuracy is table stakes.
But the next time somebody is selling you on a prioritization model, ask how much it will cost to execute vs. just building the next experiment idea. And when they tout their win rate, ask to see the distribution of outcomes… cause chances are it doesn’t matter at all.
Best of luck!