Why your A/B tests suck — the importance of Theory

“Why is there an orange gif on my website?”, I screamed as I stormed into Web Guy’s cubicle. After spending months worrying about the tone, language, structure, links and every single element on the website, I wasn’t going to take a moving arrow inside my signup button sitting down. “We aren’t selling to a bunch of cats. And our visitors aren’t just going to click through the shiniest moving thing”.

“We did an A/B test”, he told me, with the triumphant eyes of a someone hitting a home run on their first go. “And it won”. With a sample of 250 visitors over 2 days? I don’t think so!

A/B tests are beautiful. They help us build, understand and test which parts of our theories work and which parts don’t. They let us back our assumptions with hard data. They come with such cute, shiny graphs that, no matter how inconclusive, look absolutely stunning during a presentation. And, at a lower price than your average Cup ‘o Joes, you can add a complete arsenal of A/B testing tools into your utility belt.

There’s just one thing that your A/B testing tool doesn’t do for you — build the theory. And without a theory to back your test, your A/B isn’t a scientific process any more. It’s just throwing random buttons at your user and seeing what sticks.

The shitty part about A/B tests is that at the end of the day they’re tools, and just that. Having a test tube in your hand doesn’t give you the ability to find a cure for cancer.

When your experiments are backed by a theory though, you know exactly what you’re testing, and what you’re trying to learn through it. So the next time you create a landing page, you can stand up in front of marketing and design, and explain why orange is the right color for the button. Scientifically!


The Art of Forming a Theory

Theories are built when you try to explain why something works the way it does, and bring back proof to back your claim.

Whether you’re trying to understand what kind of improvements you could do to your website, which subject lines work better on your emails, or what audiences really resonate with the stories you throw, the most important part of your A/B test campaigns happen before you even fire up your testing tool.

The first step to forming a powerful theory is to ask the right question. The purpose of an A/B test is not to help you figure out what problem you’re trying to solve.

Step 1: Frame the Problem Statement

The second worst reason to do an A/B test is because your boss asked you to. The absolute worst reason of all time is because you already paid top dollar for a tool.

The first step to designing a meaningful experiment is to ask yourself why this A/B test is important. What’s wrong? What is (or could be) broken with the status quo? What data do you have to back this belief?

Step 2: Build out your Hypotheses

Once you understand the problem, the next step is to frame a possible solution to it. The solution that you propose to your problem, based on your logic and assumptions is called a Hypothesis. And the purpose of your A/B test or experiment is to verify and either accept or reject this hypothesis.

A typical hypothesis is of the form: “Placing an orange button in the first fold will result in an increase in our signup numbers”.

The level of detail in your hypotheses determines the success of your campaign

Hypotheses are based on previous theories and scientific arguments. Explain why you believe the orange button will have any effect at all on your experiment before you start the test.

Your genius doesn’t lie in just running the test — it lies in coming up with the right theory.

Step 3: Control and Test for the Status Quo

When researchers run their cool drug tests, they give a bunch of guys plain old sugar pills. That way, they’d know whether people end up feeling better because of they took the real pill or because they “thought” they took the real pill. It’s called the Placebo effect, and it helps save a few billion dollars in marketing and lawsuits every year.

The Placebo in your experiment is your Null Hypothesis — the anti thesis of your hypothesis, more popularly known as “Status Quo”. In other words, this is the A in your A/B test.

A common mistake we all make in our overenthusiastic data high is to design an experiment with two new alternatives, test them against each other and hail the victor. Problem is neither of these alternatives may actually be any better than what you already have.

Even worse, when you bring two completely new competitors into the ring you add a whole new level of variability to your test, and that can’t be a good thing.

Only once you’ve framed a solid theory and explanation, should you even start working on your variants.

Set a time frame for each A/B test and note the results back at the Hypotheses, marking it as a Win or Lose.

Schedule tests to ensure only 1 experiment is running at a time

Step 4: Back or Break your Theories with Data

The purpose of your A/B test is to learn something from it. If your tests end up backing your hypotheses, you can do your victory dance — it might not exactly explain the space-time continuum, but you’ve still framed and tested a theory, and you should be proud of it.

A/B tests are not a marketing strategy. They are TESTS. If you don’t learn something new from each one, you aren’t doing it right.

Of course, the results of your experiment could completely break your assumption and throw your reasoning out the window. But even that way, you learn something — that your reasoning was wrong.

If your A/B tests show you that the old blue button was just as good as the orange one, don’t get upset. Just think about the scientists who spend over 90% of their lives trying to see if a bunch of rats do somersaults.

mark the failed experiments too — so you know what didn’t work

my entire thought flow describing my series of A/B tests and their statuses in germ.io

germ is a tool that lets you stage your ideas and build actionable plans out of them. I use germ.io to plan around all my ideas and campaigns, because am a co-founder here. But also because it works all the magic I need it to put in my thoughts, build detailed discussions and add the next steps to them… If you liked my thought flow here and would like to build your own ideas around just about anything, go ahead and signup for germ (it’s free).

(Edited to reflect screenshots from our latest user interface)

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.