Getting strategic about your A/B tests. My experience to developing experimentation portfolio

Product decisions are tough and expensive, that’s why so many product teams take a data driven approach and defer to experimentation to test the waters and reduce the risk of poor decisions. At HelloFresh, we developed a strong experimentation culture with strategies on how to perform conversion and retention experiments. We automated the process of selecting the right metrics and extracting learnings from raw data.

However, taking a step back, I first wanted to share some thoughts on why this approach works and why we are so focused on experimentation velocity. HelloFresh is a company with users counting in millions and high user traffic, so we get a lot of data quickly, allowing experiments to reach statistical significance in a matter of weeks and in some cases, days. That drives the cost of a single experiment down and makes it financially justifiable to experiment often.

Depth of the funnel determines experiment velocity.

The idea of the funnel suggests that the funnel becomes more “narrow” at each step, since potential users drop off. However, we usually gather some data and learn about the users as they travel along the user journey. So, at the beginning of the journey, where we have higher traffic and almost no information about the user. At this step, we need to diversify our portfolio of hypotheses and test quickly as many ideas as we possibly can. By the same token, when we travel further along the funnel, we can improve the quality of the ideas we experiment with and test less, but in a smarter way. The cost of decision grows, when we experiment on the deeper steps of the user journey.

Let’s look at the example:

Experiment A tests the hypothesis of different layouts of landing pages. It only takes 3 days for the experiment to reach statistical significance and another week to mature. So, the cost of wrong decisions can be expressed in 10 days worth of opportunity cost.

Experiment B tests a hypothesis of a new subscription model for paying customers. Assuming that the company has less paying users, then prospects landing on their website, we can estimate that reaching statistical significance will take about 3 weeks to allocate users and an additional month to gather some retention data and extrapolate test results to the future user behaviour. That leaves us with 7 weeks worth of opportunity cost. That estimation does not include lost potential revenue from paying customers in case of a mistake, which makes the cost of the decision even higher.

Applying that logic, your ways of structuring experiments would depend how deep into the user journey experiment is located and how much ROI per user that user journey step accounts for. On the upper parts of the funnel/user journey it makes sense to prioritize velocity and variety of hypotheses, when as you go further along, the focus should shift to more carefully researched experiment ideas with higher incremental value add.

Getting over the local maxima

As much as experimentation is rewarding, it has the point of diminishing returns. In A/B testing this phenomenon is called “reaching the local maximum”. After that point, tactical and oftentimes smaller optimizations don’t bring as much value as they used to in the beginning of the experimentation journey. However, that does not render all of our experimentation efforts useless. It just means that you should start thinking about more innovative, riskier, bolder approaches to your problem’s solution. Introducing a complete redesign or significantly changing user journey are great examples of such experiments. The benefit of such experiments is that they will allow gathering qualitative data and shaping the trajectory to develop your product further.

So, when starting A/B testing, it’s crucial to understand what innovations and optimizations were done in the past to determine where you are in terms of you local maximum. From there, you can determine if that make sense to iterate on the current solution, bringing smaller wins and benefiting from the compound effect of smaller experiments or is it time to try something radically new.

Seeing the big picture

I believe that one quality that really shows experience and seniority is the ability to think about long term strategy and treating the experiment roadmap like a portfolio. Like any good portfolio, the experiment portfolio should be diversified. There should be a balance of short and long term investments (strategic and tactical experiments). The assets should be weighted against risks and ROI. We should take into account risks and the fact that some investments will not pay off or will take a lot of time to realize their full value. We can visualize this by matrix with expected ROI on one axis and risk of hypothesis failure on another. We can derive risk estimate from the following factors:

  1. How big is the delta between the current experience and the one being tested

In a good experimentation roadmap there should be a healthy balance of experiments from 3 quadrants: High Risk, High ROI (strategic investments), Low Risk, High ROI ( unicorn investments) and Low Risk, Low ROI (optimizations). Usually the majority of experiments will fall into the High Risk, High ROI or Low Risk, Low ROI quadrants. However, it’s important to do continuous discovery work to find an occasional “unicorn” experiment.

In conclusion, meaningful experimentation requires thoughtful portfolio balancing skills and ability to strategize the learnings. Every single bet can be viewed as a strategic move, so it’s incredibly valuable to be able to step back and see the holistic view of hypotheses you are testing and the bets you are placing. It does take skill and practice to develop that perspective, and the best way to learn it is by doing continuous discovery, experimentation and rebalancing the portfolio based on the learnings. So, happy A/B testing!

Growth-Tech@HelloFresh. Technical Program & Experimentation obsessed. Minimalist. Coffee lover & Architecture enthusiast.