Online Experimentation Studies from Wish Part I

Curious about our study? View our methodology for A/B testing percentiles

Qike (Max) Li
Wish Engineering And Data Science
2 min readAug 26, 2021

--

by Qike (Max) Li & Bo Gao

Image by Gerd Altmann from Pixabay

We recently have published our first online experimentation study — an A/B testing methodology for percentiles. Here, we provide a summary of the study. For more details, please refer to our blog post.

What is this study about? This study illustrates how to conduct hypothesis testing to evaluate a new product feature’s impact on the site performance, which is critical for most tech companies. For example, Amazon found that just 100 milliseconds of extra load time cost them 1% in sales.

What are the data science challenges? Site performance is typically measured with percentiles (e.g., P95) of the response times. It is challenging to A/B test percentiles due to two reasons: 1) computational scalability, 2) most experimentation platforms randomize users to ensure independence, but we analyze response times (e.g., 95th percentile response time), which tend to be correlated for the ones from the same user.

What have we explored? We explored heuristic rules, modified proportion Z test, and cluster bootstrapping. Those approaches are either inaccurate or prohibitively expensive in computation. At last, we employed a methodology that is statistically valid and scalable.

What is the solution? The employed solution overcomes the two aforementioned challenges and yields well-controlled type I error (false-positive rate) and high power (true-positive rate). The approach is scalable since it is an analytical solution and only requires user-level aggregated summary statistics. Further, when estimating the variance of percentiles of response times, this solution accounts for the correlation between response times from the same user and is therefore accurate.

Experimentation is in the DNA of Wish’s data-driven culture. Data scientists at Wish are passionate about building a trustworthy experimentation platform. Stay tuned for more studies in this area.

Thanks to Chao Qi for his contributions to this project. We are also grateful to Pai Liu, Shawn Song, Simla Ceyhan for their support, Pavel Kochetkov, Lance Deng, Caroline Davey, Leah Levine, and Raine Medeiros for their feedback on drafts of this post.

If you are interested in solving challenging problems in this space, join us! We are hiring for the data science team.

--

--