2015 in experimentation

(This is my annual review of everything I tweeted about AB-testing in 2015)

Experimentation seems so simple and straight forward. It’s doing different things and comparing the outcomes. It’s been around since the old testament. It’s the essence of the scientific method. Babies do it innately.

Still, just Wikipedia’s Design of Experiments page starts unraveling your illusions that this is easy. Colin McFarland from Skyscanner had a great article on common pitfalls in experimentation. Booking.com’s Erin Weigel wrote a very tactical reminder that what you test is your execution, not the concept. (And I also came across Peter Norvig’s Warning signs in Experimental Design and Interpretation this year).

P-values got their usual amount of criticism, of course. One psychology journal banned them. There was this great discussions on false discovery rate and misinterpretation of p-values in the Royal Society Open Science journal (with a more accessible write-up by Felix Schönbrodt). A 100 psychological experiments were replicated with mixed results, which has also been blamed on the p-value (the results, not that they were tested). And that’s just dealing with significant results — here’s all the 509 (and counting) ways people have described their not so significant results in peer-reviewed journal articles.

But in good news for everyone doing online testing, someone finally had the good sense to test the same hypothesis both in the lab and online - with success!

If all that doesn’t make you humble before taking Lukas Vermeer’s “So you think you can test?” challenge, I don’t know what will.

Perhaps this scrutiny is just in time, as the world outside flight and hotel-bookings has not only discovered that they are being experimented on, but are begging us to do it more. The Economist argued for using RCT’s also in policymaking. One of the great advantages with AB-testing is that it takes politics out of business. Can it also take politics out of… politics?

However, a very well-written criticism of big ideas in international development also skewered Kremer’s randomized controlled trial approach to development aid. But the gist of their criticism is basically that Kremer just ran a single experiment and then stuck with the result of that forever (and then it turned out it might not be so replicable either). The article itself advocates exactly the incrementalism which AB-testing is typically criticised for, as the only working solution for bringing countries out of poverty.

Not only saving the world, it’s also how we made what is possibly the best page on the internet.

Someone did find a use for online experimentation that is genuinely good for humanity, though: making online gaming more civil. Meanwhile in the offline world, someone found that making slot machines more human leads to more gambling. The latter having been reviewed by an ethics board, the former not.

The Road to Success is in Small Improvements, as Etsy’s Daniel Schauenberg declared in very big fonts.