Deploying Only the Right Features: A/B Testing Service

Published in

STARTteam

5 min readNov 29, 2022

In 2021, we at START felt the need for an own separate A/B testing service with a user-friendly interface that would allow us to set parameters, create groups, and launch or finish the tests at specific times. It is important that the service receives a request with a user ID, distributes them to one of the currently running experiments, and sends the details back.

Spoiler alert: Evrone company helped us develop this service. Learn how we did it below.

Just a Little Bit of Theory

A/B testing (aka split-run) testing helps you see how, if at all, changes in service functionality or design affect the users. For example, if you want to figure out what button color is better, you should show some users, say, a red button instead of the usual green one during the test.

The tests involve a small group of users with defined characteristics. This helps you calculate whether the changes will catch on with the audience. A/B tests are rarely run shortly after the launch of a new feature as the services then would lack control group data. You should, however, be cautious when introducing new features to an older and more complex product so that you don’t lose the audience and money as a result.

You also need to consider the complexity of the changes you test. Changing a button’s color is easy and you see the result immediately, it’s right under your nose. Less noticeable tweaks require a much longer experiment. Moreover, you should take into account user activity at different stages of the test. Sometimes people jump right into a new feature but eventually return to their old familiar patterns of usage. But it will not affect the result statistically.

How our Service Works

A/B test parameters, such as duration, platforms, splits, new or all users, and variable values are configured in the admin panel.

The userbase is divided into 12 master buckets (segments). We use 12 because it is easily divided into 6, 4, 3, or 2 groups.

We need master buckets to:

Manage the overlap of A/B tests
Run a global control group (you can select a bucket to not be used in A/B-tests)
Gradually deploy features

More than one experiment can be run on the same master bucket. For each group, we set up the necessary parameters for this or that experiment. Users get allocated to the right group, and the information about that goes back to the internal services.

The experiment service is designed so that in case of a repeated user request, his allocation to experiments and to groups in the experiments is not changed.

Backend

The backend is written in Python and uses the FastAPI web framework. For database and data management, we chose the usual SQLAlchemy and PostgreSQL.

By the way, thanks to SQLAlchemy we managed to reduce the volume of logic processed in the backend. Usually, all sub-entities are pulled from the database together with their root structure, which in this case is an experiment. SQLAlchemy allows you to check which pieces of the experiment are needed and pull them from the database with a couple of queries.

Load testing showed that the selected stack allows us to withstand the required load on the Service and easily scale it. Test results for one pod:

15 threads and 15 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    22.28ms    5.35ms  87.69ms   84.64%
    Req/Sec    45.05      7.96    60.00     83.58%
  Latency Distribution
     50%   21.14ms
     75%   24.12ms
     90%   27.98ms
     99%   44.76ms
  40563 requests in 1.00m, 4.87MB read
Requests/sec:    675.17
Transfer/sec:     83.08KB

Backend and frontend are connected via an API. Another API downloads user config values from the service, which are then used to compile test groups.

Dynamic Admin Panel

Originally we planned to go the simplest route and make the admin panel static, with the page refreshing after each change. But this turned out to be the wrong decision rather quickly.

We decided to add the step-by-step option for filling out the experiment form, because the parameters are many, and inputting them takes a lot of time. The feature allows you to enter some of the parameters, save them, and add new values later.

The frontend was built with React, while the layout for the admin panel was made with Bootstrap. The GUI has experiments (names), deadlines, and filters with user attributes: gender, age, and traffic source. It also tells you which fields in the form are already filled in.

Overlaps

When you have many experiments running, some of them may overlap, influencing the same parameters. This is normal, but important for the analyst to keep in mind, so as to draw the right conclusions. If you can’t see which experiments overlap, you may get strange data artifacts. Our service can track overlaps and warn you about them.

Expanding Experiments

When an experiment with a limited sample demonstrates a statistically significant correlation between one change and another, you want to deploy the feature to a larger userbase, sometimes even all of them. To this end, we thought it would be handy to have a tool to gradually increase the initial sample size.

However, this is an undesirable behavior when it comes to classical A/B testing. Until the experiment is over, the userbase for this experiment should not change much, so as not to ruin the statistical data.

Therefore, gradual deployment necessitates a parallel entity, very similar to the experiment, but with only one group and variables that have proven to be the best in the experiment.

But the difficult part is that this new entity must be defined in existing terms, with only the components being reused. Moreover, you need to guarantee a deterministic order of operation for different configurations, so as not to produce hard-to-catch bugs.

In the new service, we added an experiment type that features dynamic deployment, allowing you to change the sets of groups for different parts of the test in an already-running experiment.

Results

Previously, we split users into groups on the client side with a more or less 50/50 distribution. It was difficult to track the overlaps or run several experiments at the same time. Furthermore, we had to maintain additional logic both within the service and on the client side.

The newly-created platform for A/B testing enables us not only to run the experiments but also to track their progress, deadlines, or possible overlaps and test several options in a single experiment. We also added the ability to dynamically deploy the tested changes for all users.

#START, #ABtesting, #Engineering, #Productmanagement, #Productdevelopment