A/B testing is not a panacea, but it is fantastic at what it’s supposed to do. Some people have provided very valid criticisms about particular parts of it, but I fear that some of the poorer descriptions of its limitations are motivating people to throw the baby out with the bathwater.
What follows is intended for up-and-coming designers who are new to A/B testing and may be wary of it due to recurring points I consider off-base.
False Premise #1
A/B testing is a poor substitute for product vision
“You’re not going to A/B test your way to Shakespeare.” —Brian Chesky
This is like saying a bathroom scale isn’t a good alternative to exercise and a healthy diet. Of course it isn’t! A scale is not an “approach” like the other two, but rather a tool that lets you see how well they’re doing. Likewise, A/B testing is a design tool, not a methodology.
Paying attention to the results of your decisions does not mean you should go on “strategy autopilot”. Quite the opposite — it gives you the feedback loop you need to calibrate your actions with getting to where you want to go.
False Premise #2
A/B testing means you’re going to wind up with a sterile, boring user experience
“You can achieve a shallow local maximum with A/B testing — but you’ll never win hearts and minds.” —Jeff Atwood
Just because split testing involves numbers does not mean it favors a cold, calculated design. It favors effective design, and appealing to peoples’ hearts is one of the best ways to be effective.
The warmth, compassion and persuasiveness of the content you produce is entirely under your control. Nothing about A/B testing forces you to be a victim of uninspired decision making.
False Premise #3
A/B testing is just for trivial stuff like background colors and button text
“Ultimately, a site that is built with the logic and consistency of a clear design vision, will always trump a site that has been built with every element timidly placed and nervously tested.” —Martin Gittins
First of all, sometimes “small” stuff like button text can have a HUGE impact, so let’s not conflate minor changes with minor successes.
More importantly, though, it is simply incorrect to consider A/B testing only applicable to individual screen elements and their attributes. You can literally test two completely different screens (or features! or entire sites!) against each other.
In order to achieve reliable test results, it’s widely regarded as a good idea to control for a single variable, but the granularity of that variable is entirely up to you. Just be aware that the wider you apply your focus, the less detail you’ll have on which particular parts are making something work well or not.
False Premise #4
A/B testing means you wind up with a tame, neutered feature set
“The first big issue is that when you design for metrics, it’s easy to become risk-averse.” —Andrew Chen
This is the exact opposite of my experience. A/B testing provides a framework for experimentation, where bold and daring alternatives are given the opportunity to prove their worth.
There are scads of stories where a lower-ranking person came up with a wackadoo recommendation that, due solely to the company’s culture of experimentation, was received with a “sounds crazy to me, but let’s give it a shot” and wound up performing astoundingly well.
If only one version of a particular design will ever see the light of day, a much greater importance will be placed on “getting it right” by front-loading its creation with a design-by-committee approval process that, ironically, DOES lead to tame, neutered feature sets.
False Premise #5
A/B testing can tell you what, but it can’t tell you why
“This story, to me, explains a significant problem with designing by metrics. You get rapid feedback on whether an approach works, but none whatsoever on why.” —Cennydd Bowles
This premise is actually correct, but it falsely implies that A/B testing somehow comes up short because it doesn’t do both.
A thermometer can only tell me what my temperature is, and not why I have a fever. That doesn’t mean it’s not valuable for what it CAN do.
A/B testing is not a tool for generating design ideas, it’s a tool for validating them. If you’re using split testing as a crystal ball for UX insights, you’re using it incorrectly.
False Premise #6
A/B testing means you have to stop innovating
“… Then, stop optimizing and return to other kinds of analysis to figure out the next steps. Conduct interviews. Do user testing. Give surveys, ask questions.” —Joshua Porter
You don’t have to stop iterating to be innovative. All of the “alternatives” mentioned above can and should be happening in parallel to A/B testing — in fact, qualitative research is what a good designer bases their “B” concepts upon!
If your strategy is to throw shit at the wall and see what sticks, it’s true that the winner of a split test will be, by definition, shit. However, split testing can also identify a winner between two incredibly valuable, well-considered competitors. You are what you eat.
False Premise #7
A/B testing will embarrass you if/when something you make doesn’t achieve what you expected
[I’m not going to name names on this one, but I’ve seen it pop up a lot]
Get used to it, buddy. It’s not fun, but we’re humans and are wrong all the freaking time. Plus, consider the alternative: if something’s underperforming, how irresponsible is it to avoid finding out simply because doing so would make you feel bad?
Designing with an intentionally blind eye to the outcomes of our decisions is not designing with confidence, but with arrogance. Embrace the possibility of not only realizing your mistakes, but learning from them.
Every A/B test is an opportunity to hold ourselves accountable for the results our work should produce, and to raise the bar for our industry as a whole. Lean into it, and good things will come.
1. Many of these quotes are from people I respect and who produce work I admire. That’s why their arguments got on my radar to begin with. Some quotes are even taken from articles I largely agree with. I simply see the possibility of people arriving at unhelpful conclusions based off of them, and thought I’d add my two cents.
2. While I used “A/B testing” and “split testing” interchangeably for variety’s sake, I intentionally kept “multivariate testing” at arm’s length, as that’s a whole can of worms unto itself.