The importance of user distribution in online experiments (AB Testing)

Published in

Tokopedia Engineering

5 min readAug 29, 2023

Tokopedia has a strong culture of experimentation, we do lots of experiments across our properties. For example, when you open Search Result Page in Tokopedia App, there’s a pretty good chance you’re part of many experiments running at the same time. With a very high stake, we always try to ensure that every component of our experiment tool is running reliably. In this chance, we will discuss how a good experiment tool does user distribution correctly.

What can possibly happen?

In a nutshell, the AB testing tool’s job is to distribute users between 2 buckets of control and treatment. However, we noticed that this seemingly simple task is not that simple. There are 3 possible issues that can happen if the distribution is not done well:

Biased Allocation
Ideally, a group of users being allocated to the buckets can represent the diversity of the whole population, in terms of demography, buying power, or other criteria important to Tokopedia. Biased allocation happens when a bucket doesn’t represent this diversity, for example when users with high buying power are too dominant in certain buckets.
Selection Bias
This can happen when a group of users with certain traits is selected, so the result of the experiment is compromised. When a group containing a dominant user with high buying power is selected, it is almost guaranteed that this bucket will perform better.
Memory Effect
Once an experiment is finished, it can have lasting effects on users, good or bad. This lasting effect can affect future experiments if the group of people is distributed unevenly in the next experiment, or worse the group is reused.

Modulo operation is not random

It seems that those issues are hardly possible to happen, especially when the distribution is done randomly. However, we observed that the commonly used technique of using modulo operation with static or dynamic divisor can suffer from those problems.

There are common 2 ways of using modulo to distribute users to a bucket:

Static slot
100% population will be divided into 50 slots using % 50, here 1 slot represents 2% of the user. Bucket distribution will be done by selecting slots until the desired population size is reached, e.g. For the 10% experiment, 5 slots will be used.
Dynamic slot
100% population will be divided into an arbitrary number of slots according to the needs. Eg. for an experiment targeting 50% of users, a modulus of 100 is used and the first 50 slots are allocated. In this case, the divisor can be different between experiments.

As you might see now, the traits of simple modulus-based distribution are :
a. It’s not random enough, and the member of a slot never or rarely changes. The randomness of the allocation relies on the order of the user_id, which doesn’t have any randomness guarantee at all. This can lead to a biased allocation problem.
b. The slot’s member never changed with the static divisor, even on the dynamic divisor some users will always get allocated to the same slots. Not only memory effect can happen, but each slot can also develop its own set of behaviors/tendencies which can lead to selection bias.

Double Hashing: A better way to distribute user

So far we gathered 2 important requirements for our experiment tool, it has to be random and well-rotated for every experiment. In Tokopedia we develop a technique to do the allocation and randomization, on the allocation side we still rely on modulus operation, and for the randomization, a hashing function is used. The process is being done twice, that’s why we call this technique double hashing.

First Step

The population is divided into 100 buckets using modulo operation. However, instead of using user_id directly, the user_id will be hashed beforehand to get better randomness.

user bucket = hash(constant_prefix+user_id)%100

In this step buckets of users will be randomly allocated according to the experiment target size.

Second Step

We further divide allocation from the previous step into 10.000 buckets. Again the user_id will be hashed, this time using a prefix coming from the experiment.

user bucket = hash(experiment_name+user_id)%10000

From those 10.000 buckets will be divided further to form Control and Experiment (A/B) buckets.

By using 2 steps allocation and randomization the intention is to avoid the 3 problems mentioned above and minimize lasting effects between experiments even when their population is overlapped.

AA test to prove

At the end how good our tool or method to do things will need to be validated, better if the validation can be done regularly before every experiment is started. To prove that there is no Bias on the population in the experiment, AA usually is being used.

AA test refers to getting statistical significance from 2 different buckets of experiment with before any treatment being introduced. If we see no significance difference between buckets, meaning that our allocation is free from any bias.

Tool to “Roll” them all

Topic of online experimentation is always interesting to discuss, it combines product development, statistical and scalability challenges. Reliability, scalability and correctness of the system is very important, to make sure that data driven decision making can be done confidently.

In Tokopedia E2E journey of experimentation including allocation, AA test, AB Test, post analysis and gradual features rollout are being done on top of Rollence Experimentation Platform. An homegrown experimentation tool that has been reliably servicing our needs since 2019, as the successor of another homegrown tool we use earlier.