Lessons from Hypothesis-Driven Development

Simon McManus
Trainline’s Blog
Published in
6 min readJan 23, 2023

Product decisions are often based on gut instinct. It’s easy to make a change, but making changes that deliver the expected business value is much trickier. How can we ensure we are building the right thing and get validation as early as possible?

The principle of hypothesis-driven development is to apply scientific methods to product development. Defining success criteria and then forming testable hypotheses around how to meet them.

Over the past year, the Conversion Rate Optimisation team (CRO) has been Hypothesis driven. It has encouraged a culture of validation and allowed us to measure the impact of every change we make. It has supplied tight feedback loops for us to learn about our users and changes.

In this post, I will share some of the lessons we have learned along the way

The Problem / Opportunity

It is important to be clear about the problem you are trying to solve. Once you have agreed on a problem/opportunity you can start to think about the hypotheses.

The Hypothesis

A Simple hypothesis is easier to test, and the results are easier to read. We have found that the best hypotheses take this shape:

Given [insight], changing [xyz] will result in [expected outcome].

This ensures that they are data-driven, calling out the change and expected result. To ensure clarity about what we are trying to test in an experiment we write the hypothesis in every ticket.

Try not to make too many changes in one go as it becomes much more difficult to understand cause and effect.

In an ideal world, we would be able to isolate the effects of each change by testing just one thing at a time. This is rarely practical: firstly, some changes are so small that we are unlikely to move the needle in any meaningful way. Secondly, this is a slow way to move even when feasible.

It is a delicate balancing act. Always remember, moving fast without measurement might mean you’re moving in the wrong direction.

Learning

When your primary goal is learning, winning experiments becomes an inevitable side effect. You should expect to learn and iterate on your experiments.

With so much learning it is important to find ways to keep track. We have found decision trees help to communicate learnings between different experiments. You can learn more in this post from Spotify:

https://spotify.design/article/from-gut-to-plan-the-thoughtful-execution-framework

Keep an eye out for signals. Even a negative change proves a particular part has an impact and is worth further iteration. It might even be worth doing the opposite of what you tried in the first place.

Metrics and Measures

Creating experiments/features is a small part of the process. Knowing the right experiments to run and how to read the results is as important.

The design and implementation of every test will make a big difference in how you read its results. Involving the whole team in design, build and analysis ensures you get the best possible insights.

Each experiment will have a primary metric by which you measure success. You will also need secondary (or guardrail) metrics to track unintended side effects. For example, a positive change in conversion might be negative for advertising revenue. You should watch both.

There are also other factors to consider such as code and product complexity. Removing features with no negative impact could improve your ability to move fast.

Besides primary and secondary metrics, it is important to understand how behaviour changes. The more data you have the more you can learn. It’s worth spending time up-front thinking about the data required when analysing the results. You cannot add it once the experiment has stopped.

Processes

The best hypotheses are data-driven, e.g.:

Given that 50% of users don’t scroll the purchase button into their viewport, moving the purchase button higher up the page will result in more tickets sold.

Sometimes we do not have the data to be able to define them in such a way. The best way to get around this is to run experiments. The data can drive your future hypotheses.

We use the data insights to prioritise the experiments and inform the next steps. Each week the team review each running experiment to confirm:

  • The experiment is setup correctly
  • Data is being received correctly
  • Determine whether we should iterate, productionise, or clean up the experiment.

Regardless of the result, that experiment will produce data that can help inform what we do next.

This diagram gives an overview of the lifecycle of experiments in the CRO team and how they feed into each other:

CRO Experimentation process

We have felt the benefits of investing in our tooling and processes. Running and reading correct experiments needs to be cheap and easy. Good tooling will have a significant impact on your ability to learn fast. The things which made a significant difference for us were:

  • Creating an experimentation template in Jira for all our experiments
  • Creating an experiment checklist to confirm correct setup
  • Building custom dashboards to show the status of all our running experiments
  • Enabling the ability to view and share control/variant experiments via a URL

Side effects

Being so driven by business-focused metrics can lead to unintended consequences. We made extra investments in these areas to ensure we had a positive impact:

Experiment Quality / Completeness

We make experiments as small as possible to prove or disprove the hypothesis. We try not to invest too much time into building them, they might be wrong after all.

We do still need to ensure they are complete and of good enough quality to provide a valid test. What that means depends on the experiment and hypothesis. Simple, clear and explicit hypotheses help. From a user’s perspective, production quality must be up to the point where you collect the data.

With a painted door test to see if users click a button for example. The button should look and behave like a button, but once they click the button it might not do what the user expects.

We also baked accessibility into our definitions of ready and done to ensure it is factored into every experiment.

Testing

When working on large code bases, automated tests and good coverage are important. We have lower expectations around experimentation. We ask that all acceptance criteria have appropriate tests. When an experiment confirms its hypothesis, we improve the test coverage.

Clean-up

The nature of this experimentation approach means you can make lots of changes in a short time. Without good processes in place for clean-up, you can create quite a big mess of code that is not used at all. We have a policy of doing clean-up as soon as possible. Even if we plan to iterate on an experiment, we try to remove the code and then bring it back when we are ready to iterate.

We also found that it’s important to try to always leave the code in a better state than you found it. Continuous improvement ensures future experimentation is easier, even when experiments don’t win.

Summary

Hypothesis-driven development provides opportunities for maximising what you learn from every change. Your hypothesis might not be supported, but you will always learn. The sooner you can learn and course correct, the sooner you can have the desired effect.

The SEO team at Trainline

Thanks to the CRO team and wider Trainline community for helping to develop these processes and proof read this post.

--

--