A/B Testing as Easy as ABC with Amplitude

Julia Ferlin
tiket.com
Published in
5 min readOct 7, 2022

In the data-driven business world, A/B testing is often used to validate or compare newly developed products, features, and even elements to understand which version/variant is a better goals driver. Product development nowadays tends to have a lot of iterations which require A/B testing to comply with the development process. The A/B testing calculation and monitoring process can be quite complicated.

A/B Testing Process. Source: https://beacon.by/scandiweb/ab-testing-that-works-in-2021
A/B Testing Process. Source: https://beacon.by/scandiweb/ab-testing-that-works-in-2021

Here, at tiket.com we use Amplitude to ease all the A/B testing processes which are able to handle plan, delivery, and analysis. Yes, most processes of A/B testing can be handled in one place which is Amplitude.

Amplitude Website

In a nutshell, Amplitude basically similar to Google Analytics (digital analytics tools) that boosts a wide range of features such as activity tracking, customer journey, segmentation, retention analysis, experiment analysis, and more. This article will focus on describing how the Amplitude experimentation service can help the A/B testing process at tiket.com.

Planning

During the planning process, we often need evidence and an understanding of customers' behaviour data for supporting our hypothesis prior to the experiment. Using Amplitude Analytics we can create analysis related to customer metrics to build a better understanding of what needs to improve and define our experiment goals or impact.

Analytics features on Amplitude Analytics.

After gathering insights we can design the context, success metrics, and variants for the A/B testing to create a more meaningful test. Designing experiments on Amplitude can reduce the usage of disparate tools that are commonly needed during the “normal” A/B testing process (analysis, documentation, etc).

Also, we can define the success metric and variants to determine statistical significance using specific metrics. The success metric usually is the main metric you hope to move by running this experiment. We are also able to create a limitless number of variants for the experiment. Hence, we need to be aware that adding too many variants can make it harder to reach statistical significance. Try to keep the experiments limited to a handful of variants, at most.

Delivery

While experimenting, we often need to define the user segment as our experimentation audience target. The Amplitude Rollout section let us configure the setup easily.

Moreover, in the Allocation panel, you can define user segments that will see your experiment, specify the percentage of users who will be exposed to your experiment, and set the relative distribution weights of each variant.

Create user allocation

Defining user segments that will be eligible for the experimentation can be done in Amplitude based on several criteria such as devices, patterns, or geography. User segmentation can be created in Amplitude even prior to the experimentation setup using the Cohort feature.

Creating cohort in Amplitude

Cohort will create segments by identifying users with similar behaviour based on the activity tracked by Amplitude. Such as users that perform specific events based on the period or other criteria which can be personalized easily.

Analysis

Last, but not least thing we need to do is learn from the experiment. We measure and analyze the experiment result to decide whether the changes should be implemented or not.

The analysis usually takes some time for Data Analyst to calculate whether the result is statically significant enough for the winning variant. But again, Amplitude makes us ease this process by providing beyond just significant results in a real-time manner.

Analysis A/B testing in Amplitude

Amplitude has a default setup of a confidence interval of 95%, but this is adjustable based on your experiment needs.

In the Analysis panel, you will find the significance indicator which will inform whether the experimentation has reached statistical significance or not. If the result shows that the significance is not yet achieved, Amplitude will show a message telling you that your test needs more data to be conclusive. Along with that, you will also be able to see other data as follows:

  • The number of users exposed to the variant.
  • The performance of the primary metric, relative to the baseline. For example, if your success metric value is 2 in your control, but 4 in your variant, this column will read “4 (+2)”.
  • The % lift represents the proportional change. In the previous example, this column would read “200%”.
  • The confidence interval of the variant.
  • The significance level that the variant has reached.
  • The number of additional users who would have to be exposed to the variant in order for it to reach statistical significance if it has not yet done so.

If you feel like doing more deep dive analysis using Amplitude data, the data can be exported into your data warehouse platform (such as BigQuery) and used the Exposure event type to flag the experimentation audience.

Overall, Amplitude can be such a powerful analytical which assists our team in conducting analytical processes (not only A/B testing), and reducing the need of using several different analytics tools helps us to conduct the process even faster.

--

--