The Improvement of Our Randomization Algorithm
Online Experimentation Studies from Wish
Contributors: Qike (Max) Li, Samir Jamkhande
We published a new online experimentation study, Assign experiment Variants at Scale in A/B tests, and presented the study at the RE.WORK deep learning summit.
Randomized control experiment (A/B testing) is widely taken as the gold standard for establishing causality. The seemingly simple randomization turns out to be hard to get right. Our blog post introduces the challenges we had with our randomized assignment, how we improved the algorithm, and how we evaluated it.
Tech companies run A/B tests to learn the impact of new product features. Randomization enables the attribution of any outcome differences between the experiment groups (control and treatment) to the product feature under experiment because it maps end users to experiment variants and balances user characteristics (both observed and unobserved) between the groups. Is randomization as straightforward as flipping a coin and assigning users to group A or group B?
Our experience suggests that it is challenging to design a good randomization algorithm. Our original randomization algorithm can cause sample ratio mismatch, which indicates potential data quality problems that change the outcomes experiments and lead to wrong business decisions. Further, the assignments of different experiments are potentially correlated, which may introduce bias into experiments. Our study shows that the flaws in our original algorithm are largely rooted in the Fowler–Noll–Vo (FNV) hash function. Yahoo, in 2016, also reported similar issues about using FNV in randomization.
We improved our original algorithm and applied SpookyHash. The new algorithm is not only four times faster than the original algorithm but also meets the statistical requirements: consistent assignment, uniform distribution of the generated random numbers used for bucket assignment, independent assignment between experiments. In the blog post, we delve into the details of the algorithm and how we tested uniformity and independence.
Thanks to Chao Qi for his contribution to this project. We are also grateful to Pai Liu and Shawn Song for their support, and Pavel Kochetkov, Lance Deng, Caroline Davey, Gus Silva, Iryna Shvydchenko, and Delia Mitchell for their feedback.
If you are interested in solving challenging problems in this space, join us! We are hiring for the Data Science team.