But Why Should I Ever Run the Same Successful A/B Tests Twice??

In my last article, I introduced what might happen if you run two A/B tests simultaneously and thusly the concept of interfering tests. In sum, interfering tests involve variables which work better or worse together, depending on how they are manipulated. An example being that, when changing both the order of the article categories on your app and your marketing channels (and therefor the users you attract), some category orderings will pair better or worse with specific marketing channels/user groups. Considering this question, however, led me to wonder about what happens when interfering variables change over time. That is, for how long do A/B tests stay relevant as I run more tests and my product evolves? The answer is actually fairly straight forward, when approached like a scientist — lets dive in!

Photo by: Matthew Henry

The Problem

You may have never even thought about running the same tests more than once — after you find statistically significant results, you improve your product and move on. However, now that I have you thinking about the complexity of running multiple tests on an evolving product, it shouldn’t feel like too much of a stretch to consider that tests may produce monumentally different results when ran in alternate orders, on different product versions, or simply at different times. Thusly, once you have made it so far that you’ve ran A/B tests, evolved your product and business strategy, and witnessed exciting growth, you will need to start considering which tests to run again.

The Solution

I have landed on a method for thinking about your A/B tests to effectively handle this decision making process.

Firstly, whenever running A/B tests, you should be recording more than just the results, but also the environment you ran it in, just like a scientist would an experiment. That is, what was the state of any relevant elements of your product and its users?

In the future, you should not simply think about the A/B tests you’ve run as a list of tests with check marks by them. Rather, it should be like researching relevant scientific studies which may or may not apply to your current product and business.

An Example in Science

Imagine that you are Australian and reading about a study which concluded that consuming saturated fats leads to heart disease in Americans. You would need to ask yourself, ‘What does this study tell me about the affect of consuming saturated fats on Australians?’ The answer is that, as you are already aware, since we accept that there is no substantial biological difference between people of different nationalities, this study can applied to Australians the same as it would be to Americans. Notice now that this is because the variables of nationality and the affect of saturated fat on ones health are non-interfering.

What does this example have to do with A/B testing?

Well, you should treat your old A/B tests like they were ran on products different than your own, the same way that the Australian should treat a study done on Americans. This is because your product has changed since the test was run — sometimes in ways which are not significant (like in the above example) and other times in ways that make the results no longer transferable to your current product.

The important question that you need to ask yourself is whether the results of your old tests can be applied to your current product or situation, and if not directly, how relevant are the results? That is, what exactly are they telling you and what more would be gained from running the same test again? Or a slight variation of the test?

An Example in A/B Testing

Returning to a briefly aforementioned example, imagine that you ran an A/B test which determined the ideal order of article categories in your app. Then, your app’s set of articles and user base changed completely over time. Whether it was due to A/B testing and thusly updating of your marketing channels or simply a gradual change in the market, this dramatic change has almost certainly interfered with your old test, since you are now looking at an almost completely different product. It’s safe to assume that your old results may no longer be relevant, and, if you want to know the best way to order your categories, its time to run a similar experiment. Of course, once you find your new category order, you will have to ask yourself again about the relevance of all of your other A/B tests which were ran in the environment of your prior ideal ordering. I think you are starting to see how data analytics quickly becomes a full-time job…

So, if you need help, please don’t hesitate to reach out to me or my agency Permutable about advise or data consulting services!




We run the analytics & experimentation agency Permutable, and write about what we learn as we work.

Recommended from Medium

The Best Time to Post on Hacker News

The 4 Phases of Data Science Consulting Projects

What Health Data Scientists Can Learn from Rick and Morty?

How Coding Can Be Used To Gain An Edge In The Stock Market

Exploratory Data Analysis on Kaggle Machine Learning & Data Science Survey 2018

How to Learn Data Science: Staying Motivated.

Cohort Analysis in Python with Pandas

A tri-folded Intelligent System to pre-monitor and predict a tsunami, flood, and earthquake based…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Frederick Lancia

Frederick Lancia

ML Data Scientist at Hunome | Freelance Data Analyst

More from Medium

Enterprise AI Practice — Responsibilities and Deliverables

Using “Star Trek” to help understand what “data storytelling” means — and how you can do it, too

How to stop fraud with ML — best practices at Bolt

What I Wish I knew: Reflecting on my short time out of college as a Data Scientist