6 Lessons from rapid experimentation at the Financial Times
A year ago, the FT Apps team set out to answer the question, “What does a habit-forming app of the future look like?”. Caroline Wilcock (User Researcher), Mark Limb (Product Designer) and I (Product Manager), wanted to understand how we can take the FT App, which our analytics showed has a threefold impact on user engagement compared to our website, and turbo-charge it’s value by making it an even ‘stickier’ product for FT subscribers. We saw that many of our users did not have a ‘daily habit’ with our app — as we provide a premium news product, we understood that to be the target natural frequency of usage. FT subscribers should get some daily value out of their subscription and the App is the natural channel to provide that consistent value. Working with our data analysts, we determined a huge LTV (lifetime value) opportunity if we successfully achieve a 10% increase in subscribers with a daily habit with our app.
27 hours worth of user interviews, a diary study with 150 diary entries with users recording their habits with apps and their phones, a survey with 471 app user respondents and a detailed quantitative analysis of app user behaviour later, we developed a view of the features that encourage habitual app usage. Some of the highlights:
- consistent content packages (especially if personalised per user)
- multiple media formats for different mindsets (e.g. audio for when multi-tasking, videos and images for more lean back / downtime moments)
- bite-sized content to easily suit pockets of time
- serendipitous but consistently-available, personalised content discovery
- flexible, habit-triggers suited to user preferences e.g. personalised / customisable email, newsletters and push notifications
Why rapid experimentation?
We came away with many ideas we could potentially test to see how they worked in the context of our app and workshopped ideation concepts with editorial and other teams across the business for new features for our app. With so many feature concepts emerging, we decided a rapid experimentation approach — that allowed us to validate as many concepts as possible as quickly as possible- was the way to go. The idea was to:
- rapidly test the core concept (usually through A/B testing)
- understand if it indeed showed habit-forming potential — the ability to elicit repeat usage from users and bring them back to our app
- iterate it into a more comprehensive feature if it did and ‘bin’ it if it didn’t.
We agreed with the engineers to relax our usually high coding standards for these experimental features on the basis that if the concept tested successfully, we would address any ‘tech debt’ before the feature is productionised and made available for all app users after the experiment.
So what have I learned attempting rapid experimentation for the first time?
- Diverge, then come back together
We brainstormed several ideas — through two cross-functional ideation workshops on the main opportunity areas we found and developed several concepts based on our discovery insights.
We’re now seeing how these concepts can come together , — e.g. concepts we had for audio and other alternative formats work well as part of a consistently-available personalised discovery area. Similarly, push notifications work well with consistent content formats.
2. Plan for failure
One mistake we made at the beginning, was not considering what we would do if we didn’t see the expected outcome with our key experiment metrics. It didn’t always mean the concept had failed — other factors were sometimes relevant to deciding if the feature was worth iterating.
For example, if there was no statistically-significant impact on the key metrics between the control and variant groups in an experiment, we sometimes found that it was because uptake of the new feature was low for users in the variant but that we would see uplift in engagement for the users who used it compared to those who did not. Suggesting that feature onboarding was the issue not the feature concept itself.
A decision tree is extremely helpful for being prepared for ‘failure’. Set out what steps you would take or what further information you would want if you did or did not see the expected outcome.
3. Get from the ‘what’ to the ‘why’ by bringing quant and qual together
While rapid experimentation was giving us lots of good data, we also needed qualitative research to understand more about why things were happening.
For example, having not seen key metrics move at the top level across control and variant for an experiment with audio articles, we dug a bit further and saw interesting patterns:
- Many users didn’t try the feature / only tried once. This provided context for why at the aggregate level we saw no change in the metrics
- But we saw engagement increased for users who used the feature (deeper quantitative analysis). We used qualitative interviews and surveys to understand why more users did not use the feature or use it more than once
- Interviews with those who used it explained the core need for audio in the app — to enable multi-tasking while using our app
- Surveying the variant (including users who didn’t use the feature) — confirmed this core need but highlighted how the current implementation and UX of the feature stopped the feature from meeting this need
4. Analytics is your friend
We have a relatively small Data Analytics team for all the business’ needs, including A/B testing. Running several experiments in quick succession required us to work very closely with them — we scheduled a quarterly session on our upcoming tests to set expectations and help them plan their workload, but also provided an opportunity for valuable input on objectives, test approach, metrics and our decision trees.
We also discussed some modified ways of working — one main analyst with overall understanding of the context of our tests, so that they had more context when conducting analysis so our insights are richer.
5. How small is too small?
We initially started with very quick tests — e.g. one involved adding a link to our navigation menu as the entry path to the feature. In the context of an app where our users tend to have very set habits, we saw that such very small changes can lead to inconclusive results unless you have huge traffic to the page / area — so think “how small is too small?” and work with analysts to understand how much traffic and calculate the time to statistical significance.
6. Use fake doors where possible and appropriate
Fake door tests — where users are provided with the option to use a feature that has not yet been built — can be a very helpful indicator that users will actually use a feature and that it meets a true need, with minimal effort invested.
We placed a fake door test featuring 3 options for games (crosswords, quizzes and sudoku puzzles) at the bottom of our App’s homepage with a survey asking for users’ feedback on why they wanted the option selected. We have never tried this before and were nervous about our users responding badly to being presented with the option to use a feature and then discovering it didn’t exist. We managed this by launching the test when we’d be available to respond by disabling the test if the negative sentiment generated was significant.
We, however, received over 2000 survey responses and very few (1%) complained about the approach. The vast majority enthusiastically told us about why they wanted Games in our app. This helped us validate the potential value of Games for habit formation quickly and with minimal build effort. For more technically complex features like this, fake door tests provide a ‘cheap’ way to confirm there is user value before taking on the complexity.
Overall, we found the rapid experimentation process to be extremely beneficial. We tested lots of ideas, learnt a lot about successful experimentation and fairly quickly (over about 6 months), identified key habit-driving features that will and are enriching users’ experiences with our App.