Real-Life A/B Testing in Startups

Simon Deichsel
Project A Insights
Published in
5 min readMay 9, 2017

These days, everybody has heard about A/B testing and nearly everyone has at least taken part in setting up an A/B test or looked at its results. It seems very straightforward at first sight, but as always god (or if you prefer: the devil) lies in the details.

Marketing departments of A/B testing suites are telling the world that firing out A/B tests is a breeze, and they all provide the tools to make you a conversion superhero in minutes.

However — in reality — when you just use everyone’s favorite tool, take the built-in editor, and move some elements on your page around, your A/B test will likely cause more trouble instead of helping you to find even the tiniest conversion uplift. As always, good things only happen when somebody thought deeply about them.

In this article you will find guidelines for A/B testing. In the end, a good A/B testing strategy is not so different from a product lifecycle in turbo boost mode.

Idea Selection

At the beginning, the most important thing is finding the right ideas. If you work in a startup and have less than 10k of visits per day, you are forced to just test one hypothesis after another, always using your entire traffic and starting at the beginning of the funnel. Make sure that you get a lot of relevant testing ideas in. Away with ownership of ideas, let a thousand flowers grow! Ask everyone in your company to send in ideas, ideally by organizing a competition for the best testable conversion optimization idea. If you use a shared spreadsheet you can avoid duplicates and get some guidance, but ultimately the selection of the first candidate will be based on the gut feeling of a jury that should take the decision. André Morys from Web Arts suggests to check these questions in order to select your hypotheses:

  1. Is the visual contrast of the hypothesis big enough to be noticed by users?
  2. Is the hypothesis bold enough to actually change behavior?
  3. Are there psychological findings supporting the effects of the hypothesis?

Designing and Iterating

Once you settled for an idea, you can start a mini design sprint to flesh out the details. Get 4 or 5 people together, and sketch with pen and paper to create as many iterations in slots of 5–10 minutes until you come to a conclusion and settle for 2 versions of a final wireframe. Then, let a designer create a fully fleshed out design. Work on the wording until you feel it is perfect.

Now, test.

Leave the comfort zone of your office, show your picture of the website to people. We found that using an iPad makes a simple JPEG feel much more like a website than printing it out. Once you got feedback from 5 people, improve your design accordingly. Test again. Do this until you do not learn anything new or relevant for the test from people’s responses. Do not forget to design a mobile view — or exclude mobile traffic if you can afford that.

Implementation and Quality Check

Now you are ready for implementation. If you do not possess firm and advanced front-end development skills, let a developer do the work, even if you are using a tool that offers a wysiwyg editor. Be sure to do real cross-browser and cross-platform quality assurance and let one person who never saw the test before do a final proofreading.

Measuring Success

Before you start the test, you should decide on one main KPI that you want to push. If you are really low in traffic, focus on a micro conversion, because otherwise you will have to wait ages until you can trust your results. Use an A/B testing calculator to calculate the needed duration time of your test in advance. Generally, never judge an outcome if your test ran less than a week because weekend effects are quite strong, so you need to show your idea works every day of the week.

Now it is time to start your test. As I have written before, I am a fan of splitting up your tests into identical subgroups (called AA/BB testing) as this helps you to judge instantaneously if the effects that you are seeing are real (the graphs of the subgroups converge) or still heavily influenced by chance (the graphs of the subgroups are apart from each other by more than one standard deviation). If you see both B variants are performing worse than the original by more than one standard deviation and the first week has passed, it is probably wise to stop the test, even if your calculated duration time is not yet reached. After all, you are looking to increase a KPI, you are not interested in finding out the truth. If some people in your company are confused by the AA/BB testing, it is easy to sum up the subgroups to generate an alternative view of the AA/BB test that just looks like a normal A/B test, where you can use all the standard statistical methods without losing any traffic.

(Side-note: always exclude your own IP address range from your testing tool.)

Final Steps

It is always a good idea to note down the results of your test in a separate table, so you can learn from failures and successes of the past. Be sure to include as much raw data here as you can, as this will reduce the likelihood of anyone doubting your results later on.

When you reached a result, quickly plan the next test. In theory, you could start doing so already while another test is still running, but in practice choosing your next test is often dependent on the results of the ongoing one.

If you fail to produce significant uplifts in your tests, be assured: You are not the only one: most A/B tests fail, but it is very rare to read about these failures in the public. Keep calm and carry on testing.

--

--