This is very interesting.
Jan Florjanczyk

Our experiments typically have relatively large sample sizes so our statistics are CLT (Central Limit Theorem) based and therefore we only need the first and second moments. We also offer a view of our metrics with winsorization so that we can detect cases where estimator normality might not hold.

For more information on our p-value stats computation, check out this simplified code snippet:

Note: We typically work with two types of metrics at Airbnb. All metrics are based on a simple Numerator/Denominator definitions. For most metrics, we use the treatment population as the denominator in the calculation. The code above embodies this approach.

We also have “ratio-metrics” which are a ratio of two other metrics. This is particularly useful for detecting effects on engagement (searches per searcher) or conversion metrics (searchers who go on to make a booking). For “ratio-metrics”, we use the Delta Method to apply the cross moment for each subject in the experiment. This computation is more complex, requiring joins across events in the “subject summary” table.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.