The Importance of Experiments

Introduction

Chong Han Khai
6 min readApr 25, 2020

The first reported randomized controlled trial (RCT) was done by James Lind more than 250 years ago to identify treatment for scurvy. This scientific approach is then popularized in the early 20th century by Ronald A. Fisher in his writings of experimental research. Ever since then, fields such as medicine, social sciences, agriculture and many more have utilized and developed new methodology in terms of execution and statistics to cater for different situations. In fact, this term is now more widely known as A/B testing, a term popularized by the marketing firms and tech companies. For me, my preference is still to call it experiment instead of A/B test because A/B test is just a subset of RCT with two variants.

With recent hype around RCT, my questions has always been, why is such an old concept trendy again? Why are more and more companies utilizing the idea?

Why is experimentation trending?

The recent hype of experiments is due to the abundance of data since the rise of the internet. This enabled huge tech companies such as Facebook, Airbnb, Youtube to scale to their current size because they have such a huge data advantage over their competitors. They have hundreds of millions of active users and they realized that they can build new products and features by “listening” to their users. This enabled them to iterate through products quicker than their competitors.

As a matter of fact, experimenting has become a core exercise in building great tech products. However, there are so many pitfalls when doing experiments online, most companies are perhaps better off trusting their intuition. But is this the case? As a small or medium sized company, if we do not have expertise in executing experiments online, should we just stick to intuition? The answer is NO and here’s why.

Reduce the chance of making an error

One of the intentions of doing an experiment is to scientifically find out the difference between having vs not having a treatment. Treatments can range from changing the color of a button to having a completely new feature for a product.

Illustration of Metric Pre and Post Treatment

For example, the red dash line in the graph above will not be visible to us without experimenting. By comparing the results pre and post treatment, our judgement will believe that the new feature helped improved our target metric, but we are in fact wrong. The increase has happened because of an unobserved reason and the feature actually have a negative impact on the metric. This is also known as Type I error, and performing experiments reduces our chance of making these errors. On the other hand, we reduce our chance of making a Type II error as well, where we thought a treatment is bad but it is in fact improving our target metric.

Making science a culture

It is not a coincidence that scientific evidence is at the core of making decisions in most if not all of the big tech companies. At a certain scale, the culture of listening to HiPPOs (Highest Paid Person’s Opinions) or making decision based on intuition will only bring harm.

Firstly, when all the decisions are made by a few people, they tend to spend less time thinking when making decisions. Secondly, the decisions made from few person’s perspectives limits the potential of a product. Thirdly, when there are thousands of decisions, you will never find a way to consistently make the right call without scientific backing. In fact, it is common in big tech companies to see 9 out of 10 features not getting rolled out because they failed to do what they were hypothesized to do, these decisions are only possible because they have a scientific process in place for deciding what is good and what is not.

Hence, it is important to have a scientific framework for guiding what should be shipped what should not be shipped from the start. If this culture is not imposed from the beginning, the culture of making decisions based on intuition will take over and your ambition of scaling will be doomed.

Compounding Effect of making wrong decisions

In the short term, making a wrong decision has a negative impact of moving the needle in the opposite direction. But it is so much more than that in the long term. To illustrate my point, here’s a thought experiment.

Illustration of every possible decision.

Imagine each of the box as a decision you make and the green dash line is the optimal path. The purpose of experimenting is so that you will have a smaller chance of deviating from the green dash line. At the beginning, it is easy for intuition to tell us which is the green dash line. For example, adding options for users to make payments via credit card will increase the number of paid users. However, as we optimize the product, we will find less and less opportunities like this. We will reach a point where we might be trying to see if changing the aesthetics of a small button move a certain metric. In situations like this, intuition can no longer keep you on the green path.

Just like interest rates, these wrong decisions have a compounding effect. The path you take also affects the boxes you will face in the future, i.e. how you think about the product and the future direction of the product. You might actually miss a crucial turning point of your product between one that would have been used by tens of millions of users and another that is only used by hundreds of thousands of users because you made a decision that limited the potential of your product because you were greedy for the short term gains.

It takes years to master the art of experimentation

It might all look cool and perfect having hundreds if not thousands of experiments running concurrently at companies like Facebook, Netflix and Airbnb. What we do not realize is how much resources and how many mistakes did it take for them to be at where they are now. As an employee at a small to medium size company, you might think that it is a waste of time experimenting on something that will obviously improve your product. However, when decisions are no longer easy to make and you ask for experiments, there are three barriers you will face immediately.

  1. Different people start giving their opinions and think they are right. You are in a stalemate because science is not a standard practice of how decisions are made.
  2. You start stepping into the common experiment pitfalls such as implementation error in engineering, correlation between assignment in experiment A and experiment B, etc. (I can go on and on for hours listing the possible mistakes) These mistakes are now more costly than before because you have more users.
  3. Executing experiments is slow and painful, the code behind your product is so complex that introducing experimental components is difficult or break it at the worse case scenarios. Your product was not built with allowing experimentation in mind.

Conclusion

Overall, I highlighted all these pros of experimenting cons of not doing so in my blog post. I hope it helps people get a sense of the importance of experimenting despite all the obviously beneficial “low hanging fruits” they are currently still picking.

As with all things, there are always trade-offs. The most difficult part is actually knowing when you should start experimenting and how do you know whether you are doing it the right way and getting robust results. Start experimenting too early on and you risk losing some opportunities because you are slow and experimenting too late and you risk losing customers and making lots of wrong decisions. There are also requirements before you can start experimenting, such as the number of users, otherwise it might take too long to collect enough results to get a scientific results.

Experiments are also not the only scientific process when making decisions. There are a lot of cases where experiments are not even needed to make a scientifically sound decision. I did not highlight the process of deciding when is an experiment required in this blog post, but might do so in my future blog posts.

--

--

Chong Han Khai

A data person who cares a lot about best practices and being scientifically correct.