A/B testing: How we scaled from one experiment per year to two per week

Simone Congiu
Team Taxfix
Published in
5 min readApr 30, 2020

Simone Congiu, growth product manager, shares the planning and prioritization framework that helped our team scale A/B testing. Learn how our Product Growth Team laid the foundation for efficient and reliable experiments.

When I joined in May 2019, my goal was to improve the conversion rate of our acquisition funnel through experimentation. Luckily, the company was very interested in taking an experimental approach. So my challenge was not to convince management that A/B testing was important, but rather how to get started.

Getting started with A/B testing

Before starting to test, we made sure to address these three key points:

  • Traffic: We needed traffic. Many companies make the mistake of investing in experimentation before reaching the necessary traffic to test confidently. For us, the only platform where we could launch continuous testing was the mobile app since it had a much bigger user base than our website or web app.
  • Process and prioritization: We wanted to have a process in place to correctly design experiments and prioritize them. Every experiment has an opportunity cost of time and traffic, so we wanted to make sure we were getting the biggest impact for our effort.
  • Infrastructure: We wanted to have complete trust in the data. Having the right infrastructure, in terms of tools and tracking, helps avoid the risk of launching experiments and not being able to take action on them.

How to define and prioritize experiments

At that time, the Product Growth Team was just myself, a product designer, and a developer. Since we were a small but mighty unit, efficiency was a top concern.

We focused our energy on the acquisition funnel to see if we could increase the number of users who register after downloading our app. In order to get this project rolling we asked ourselves, what experiments are we going to run and how?

We had a lot of ideas but needed hard facts to help us prioritize them. That summer, we conducted exploratory user research to understand the potential issues our customers were facing when using our app that prevented registration.

We identified some trends in the answers, which became our hypotheses. To help classify each hypothesis, we followed this useful template:

  • Hypothesis: why we think a user doesn’t complete the desired task at a defined point in the flow.
  • Solution: how we think we could solve the issue stated in the hypothesis.
  • KPI: the metrics we think we can have an impact on.
  • Prediction: the pay-off. If we are going to realize the solution then we will improve the KPI by a certain percentage, which is our expected impact.

After listing all the hypotheses we had, we applied the RICE model, a scoring model for prioritization:

  • Reach: how many users we have at a certain point in the flow
  • Impact: the expected improvement we assume the test will have — also stated in the “prediction” stage above.
  • Confidence: the level of trust we have in the experiment, which is influenced by data and insights from user research.
  • Effort: days of development and design time.

We also took into consideration an additional metric, given the particular nature of our app and the many steps a user takes:

% of Drop: how many users we lose at a particular point in the journey. This helps us understand how big the drop we want to solve is.

We used this formula to prioritize:

We then built quick prototypes of the flow and tested them with users to get more insights on what was and wasn’t working. Finally, we were ready to move to the solution phase and built various iterations for the experiments we wanted to launch.

Finding the right infrastructure for A/B testing

Knowing what to experiment on is an important step, but without having the right technical infrastructure in place we were still blocked. The first problem we had to tackle was deciding which tool was best for running experiments in our Mobile App. We evaluated different options for the tool based on this criteria:

  • Platform: since the majority of our user base uses the mobile app, we decided to focus on a tool that allowed us to test there without extending the scope to omnichannel testing.
  • Speed: we wanted to start ASAP, so it was important to decide on a tool that would be easy to implement.
  • Data integration: given the complexity of our app, we wanted a tool that allowed total integration with the hundreds of events we have in our database.
  • Price: A/B test tools range in price from a few hundred to thousands of euros per month. Since we were just starting out, we set our sights on a thrifty option.

Based on these criteria, we decided to go with Firebase, a tool we were already using for analytics. We spent the next couple of weeks running A/A tests and troubleshooting. It was a challenging time, because we were eager to start testing our hypotheses but tracking issues were slowing us down. Finally, at the end of November, we were confident in the reliability of our results and we were ready to test.

First A/B test launched on Firebase

We had good success with the first experiment, which brought us a significant improvement in the conversion rate of users who registered. After months of refining the hypotheses and setting up the infrastructure, we finally could see the fruit of our work.

Our A/B testing today

Flash forward to today, we now have two teams focused on improving conversion rate with A/B testing. The Product Growth Team has increased to five — and we’re still growing! We’re running two experiments in our mobile app each week and finding new solutions to increase this number even further.

The challenges we are facing now are very different from where we started almost a year ago. With our planning and prioritization framework in place, we now have time to focus on the technical challenges of scaling tests and establishing a process where everyone can test comfortably and correctly.

Interested in joining the Product Team? Check out our open roles.

--

--