Building products for large scale experimentation

Posted on November 3, 2016 by Dave Pier

At Skyscanner we have been running hundreds of AB tests to learn how to improve the site for our users. In order to build our experiments faster we have developed an in house system to separate code changes from experiment variants and in so doing provide a massive increase in flexibility. In essence we have turned our whole site into a set of Lego blocks that can be combined in an almost infinite number of combinations that anyone in the company can control from anywhere in the world.

If we step back a few months we would build our AB tests in the standard fashion. We would use our experiment platform, Dr Jekyll, to assign users to a particular variant of an experiment. Each variant of the experiment would then be directly linked to a section of code. If a given user was in the control group they experienced the standard site, if they were in one of the variants they would receive an altered site and we can track the difference in behaviour. While this works well for areas of investigation that are well bounded it is quite inflexible for new areas where we will have multiple rounds of iteration with each round building on the learning of the previous round.

In order to allow AB experimentation to scale as well as maintain our lean/agile culture we have built in an extra layer of flexibility into Dr Jekyll. We can now tie our code segments to configurables. A config can be thought of as a link between the main body of the code and the parcel of data that it contains. This parcel might be a whole module of code needed in an experiment or it might simply be a boolean value or a string of text. We initially built these configs to allow us to change strings and values throughout the product for different markets and different situations. However, tying code segments to configs and tying multiple configs to a single experiment variant allows for an order of magnitude more flexibility in how we build for experimentation.

In this diagram we can see that configs allow multiple small independent code segments to be combined into a single experiment variant.

If we now modularise our code such each change we might want to make in an experiment is independant from any other then we create the lego blocks we need to build experiments. Let’s look at an example of where this becomes useful. We wanted to look at redesigning our booking panel from a price centric layout to one that prioritised information and alternative booking options. There were a number of changes that we felt that we needed to make in order to make this change

  1. Collapse the itinerary information
  2. Allow the provider list to show our new star ratings
  3. Move the itinerary information to the top of the panel
  4. Expand the previously closed provider list

In the traditional approach of building AB experiments it is tempting to build the single preferred option and compare only one variant with control. If it improves metrics then great give yourself a pat on the back and ship it. If metrics go down, then what happens? There is no way to know which of the changes had the effect. Do you start stripping back the changes to one controlled change at a time or make more changes until something works? In the new system we can build each one of these changes as a separate config and combine them in a single experiment and control for each of the changes (taking the appropriate statistical considerations for multiple tests). In this particular example we had wanted to check 4 variations but we could have tested 12 given the combinations possible. As it turned out when we saw the final version in the browser we decided to test a variant that we did not intend to build but was possible to create, with no additional development effort due to the available combinations and this was the one that was eventually shipped to production.

Since implementing the config layer we have found numerous use cases. MVP experiments that are inherently risky can be derisked by starting broad but shallow and then additional functionality built in additional layers of configs as the data from each round of experimentation allows us to refine our ideas. We can also use configs for feature flags by turning their features on but disconnecting them from the underlying experiment. This allows market by market flexibility that can be controlled independently from the core site.

An additional benefit of using a modular config approach has been that this abstracts the complexity of experiment design from the development of features. Developers can now build and test modules independently without needing to worry about which 5 changes need to hang together for a given variant. If we want to extend the experiment in the future then we simply add another config until we have the feature creating the user benefit we had hoped for in the first place.

Similarities with multivariate testing

This approach is similar to multivariate testing, but deliberately limited to specific combinations of code segments/changes. Multivariate testing runs ALL combinations of changes together in order to determine which combination of changes produces the optimal effect. An example would be changing a button placement, string and colour. If there are 3 versions of each placement, string and colour then that 3 x 3 x 3 combinations to test. The system we are describing here allows us to run a multivariate test if we wish BUT it also allows more modular AB testing as described above. The primary purpose is not to throw every possible combination at the wall and see what sticks but rather to reduce the time and cost between learning from one experiment and implementing the next iteration with a directed hypothesis.


Learn with us

Take a look at our current job roles available across our 10 global offices.

We’re hiring!