Our experiments typically have relatively large sample sizes so our statistics are CLT (Central Limit Theorem) based and therefore we only need the first and second moments. We also offer a view of our metrics with winsorization so that we can detect cases where estimator normality might not hold.
For more information on our p-value stats computation, check out this simplified code snippet:
Note: We typically work with two types of metrics at Airbnb. All metrics are based on a simple Numerator/Denominator definitions. For most metrics, we use the treatment population as the denominator in the calculation. The code above embodies this approach.
We also have “ratio-metrics” which are a ratio of two other metrics. This is particularly useful for detecting effects on engagement (searches per searcher) or conversion metrics (searchers who go on to make a booking). For “ratio-metrics”, we use the Delta Method to apply the cross moment for each subject in the experiment. This computation is more complex, requiring joins across events in the “subject summary” table.