You’ve got a great idea for an A/B test.
You’ve done your homework, designed a killer variation and are ready to test. Your team launches it and everyone’s excited.
And guess what, your variation is winning.
By a pretty substantial margin at that. Your first instinct may be the raise the mission accomplished banner and launch your winning variation at 100%.
Has your test reached the Minimum Traffic Threshold?
Depending on your service (consumer v. enterprise), the traffic you’re testing can be highly variable, which can have a profound impact on your test results.
For example, at about.me I’ll occasionally see a tweet that like:
Gabrielle’s intention was to sign up that day, it really didn't matter which test group she was in.
This can lead to wild fluctuations in A/B test results both positive and negative. And if you’re not careful, it can lead to false-positive or false-negative test results.
Minimum Traffic Threshold (MTT)
One way to reduce this risk is to identify the Minimum Traffic Threshold for the page or flow you are testing. It’s called A/A testing.
- Create a test with two variations that are exactly the same.
- Drive a small, but equal amount of traffic into each variation and watch the results.
- Continue to ramp up traffic into the groups until they reach parity for the metric you’re testing (Note the total traffic in each group when the two variations reach parity).
- Repeat step 1 — 3, but this time allocate more traffic so you reach parity faster.
Did the tests reach parity at the same number of subjects? If so, thats your MTT.
Any test you run on that page or flow should reach that traffic threshold before you declare a winner.
But be carful, each page or flow can have a different MTT. For example, the about.me homepage has such variability, the MTT is 40k visitors. But in other parts of the signup flow the MTT can be less than 500.
Just to show how wild these fluctuations can be, here an example of one of my recent MTT tests.
After about a day, here are the results:
As you can see, Variation #1 has won — A 37.7% improvement and has reached statistical significance at 99%. The only problem is that both variations are identical.
After letting test run for a bit longer, here are the results:
As you can see from the graph there was a ton of variability in the early days of the experiment. As as more people were added to the group, the results began to converge.
These results corroborated an earlier MTT test, where the threshold is around 8,500 visitors / variation.
Having these baselines for your A/B tests will allow you do better, more accurate assessments of your results and overall reduce the risk of false-positives and false-negatives.
If you have questions about MTT testing or other growth related topics, email me at ryanfujiu at gmail dot com. Or, follow me on Twitter :)