Source

A/B Testing’s Worst Enemy: Sample Ratio Mismatch

Diagnosis and Solutions

Andre Ye
DataSeries
Published in
4 min readJun 4, 2020

--

Abraham Wald was born in 1902 in the then-city of Klausenburg in the Austro-Hungarian Empire. Instinctively a mathematician, he was brought on as a statistician for the Allied Forces in World War II. Of particular, one problem stood out: too many planes were getting shot down when they passed through German-held territories. In response, the Allied Forces wanted to put more armor on the planes.

However, armor is heavy, and at a certain point adding more armor actually harms the plane’s chances of surviving. Somewhere, there is an optimal amount of armor — they just needed to figure out where to put the armor.

After aggregating where bullet holes landed on planes, the Allied Forces rushed to place more armor on those areas. Wald, however, quickly pointed out a major flaw: the only data available was from the planes that returned. Places on returned planes where bullet holes appeared did not need more armor because the plane was able to return even when those areas were shot at. On the other hand, the planes that didn’t make it back must have been shot in places that the planes that did make it back weren’t, which is why they crashed. After recognizing this, Wald came to the conclusion that armor should be put on the places that weren’t hit.

--

--