How we setup an A/B test to study the best user experience for our users

A guide on how to setup A/B tests on Android to help improve user experience

Akqeel

Published in

99.co

6 min readJul 20, 2022

Background

Our team was tasked with improving the shortlist experience for our users in the app’s listing details page. This feature is similar to the “Favourites” function we would generally find on e-commerce applications. The existing shortlisting function on the app had a few problems and user behaviours our team had discovered via user research, and this was an opportunity to help address these issues. One of these problems was the visibility of the shortlist CTA in the listing details page:

As seen in the screenshot above, there is a possibility of the shortlist CTA “blending” in with the listing photo, thus reducing its visibility. The hypothesis our team wanted to test out for this experiment was whether the lack of visibility of the current shortlist CTA prevented users from being able to shortlist their favourite listings — and would increasing this CTA visibility help improve the shortlisting experience.

Processes and Team Structure

With the interest of time in mind, our team took the approach of implementing the test on one mobile platform (Android) and implementing the result on the other platforms (iOS). However, one drawback of this approach is that the user behaviour on iOS can differ from Android, but in this case it works out because we don’t see a big difference in iOS vs Android users’ behaviour. But this is something we would need to be mindful of when conducting future experiments.

In this guide I will walk through our team’s approach on setting up and running the experiment on Android, but the approach for setting up and conducting experiments on iOS would be very much similar. For this project we had help from our PM, designer, user researcher, data analyst, android engineer and QA on the team with setting up and conducting the experiment.

Tooling 🛠

For our experiments our team decided to use the Remote Config and A/B Testing tools available on Firebase, and Segment for tracking the shortlist CTA usage.

Implementation 💻

For our shortlist experiment, our team came up with 3 different variants. The Control (or Baseline) variant is our existing shortlist CTA implementation, while version B and version C are new designs to be implemented. The approach we took to implement this feature was to:

First define the metrics for this experiment.
Implement the two new variants on android.
Setup firebase remote config to control the visibility for the three different variants.
And finally setup the A/B testing experiment on firebase.

For each step we had a subprocess which I will go through next.

1. Defining the metrics

Before setting up the experiment, our team first needed to identify the metrics we’d need in order to accurately measure the performance of this experiment. Our team identified two metrics, one primary and one secondary. The primary metric we identified was the shortlist CTA usage, and the secondary metric was the number of enquiries on this page. The primary metric would help us to measure if the change in the shortlist CTA layout has been effective, while the secondary metric will help us to identify if the increase in the number of shortlists negatively impacts the number of enquiries. Once this was finalised, the next step was for us to begin work on implementing the new variants.

2. Implementing the new variants

Since the Baseline variant is the existing CTA implementation, only the two new variants (version B and version C) needed to be implemented. Before beginning the implementation for the new variant, I set up the high level functions which control the visibility of the different variants and the control flow. This helped me to better organise the codebase, simplify testing, as well as avoid the situation where multiple variants become visible simultaneously.

Once this was setup, I began the UI implementation for variant B and setting up the logic for the click events. We follow the MVVM architecture in our project, thus the business logic for all three variants can be reused with the only change needed on the UI. This helps us to cut down on implementation time by ~3 times, as well as simplifies testing and bug fixes.

3. Setting up Firebase Remote Config

To further speed up things, our team decided to test out the UI and business logic for version B while version C was in development. To achieve this, I added a new key-value pair on Firebase Remote Config which could be used to control the variant version for shortlist in the listing details page, and set the condition for this new key-value pair to be applicable only for the dev version of our application. Once this was setup, both the QA and designer on our team were able to test out this variant, and toggle between the Baseline and Version B variants by switching the values on Firebase Remote Config.

After a few rounds of QA and with feedback from both our QA and designer, version B was ready to ship! We followed a similar approach for version C, and once it passed both design and functional QA our feature was ready to be rolled out to production 🚀

4. Setting up the experiment on Firebase A/B Testing

Once the android app update is submitted to Google Play Store, there is a review process which takes a few hours before the update is available for download. Meanwhile, we set up the experiment on Firebase A/B testing and switched our Remote Config key-value pair to the app’s production version. Now with all this setup, Firebase would deliver one of the three variant versions to our users on the new app update and Firebase A/B testing would start measuring the performance of the CTAs for each of these variants.

**Setting up Firebase A/B Testing (Remote Config experiment)**

Experiment Duration and Results 🔬

Our team decided that around 1–2 weeks would be an acceptable duration to conduct this experiment, and we can use the results from the experiment once this duration has elapsed. Firebase A/B Testing would continuously measure and update the winning variant for this experiment for the duration passed up to that point in time. Once the total duration the team decided for this experiment has passed, or if Firebase A/B testing has found a clear winner (whichever comes first), our team could then rollout this winning variant to all our users (including our users on iOS!) 🎉

Learnings and further improvements 📝

This was the first A/B testing experiment our current team has conducted, and we had quite a few learnings ranging from how to set up the tooling, to coming up with the processes. We needed input and feedback from different stakeholders including our PM, designer, user researcher, data analyst, developer and QA and the processes we followed had been very helpful in making this a fun and valuable exercise. We hope to conduct more experiments in the near future with the aim of providing a seamless user experience to our users!

Thanks to Jessica Bodo, Curtis Koh, Charlotte Lee, Kevin Wee, Aldryn Deschara Putra and Tsu Myat Thandar Htike for helping setup and run this experiment!