When statistic is the matter of numbers…


When I was in university, I was forced to read papers every single day. The format was almost always the same: introduction, problem, methodology, data analysis, explanation, and conclusion. I‘ve read all kind of paper, starting from great research papers that have fundamentally changed things, to several crappy papers that made me think science is bullshit.

Don’t get me wrong, I did not think the topics were bullshit. Usually, the methods the writers used to draw their conclusions did not seem right to me. It’s either due to their methodology in conducting the study or the way they used fancy statistic to analyse their data. At the end, they would get STATISTICALLY significant results.

The question remains: when statistic says it’s statistically different, does it really different in real life?

Now that I work in a company running hundreds of experiments each months and relying a lot on significance levels to launch a product, the thought of how we should treat these numbers keeps bothering my mind. I’ve seen how the exact same experience can have two different results, yet people sometimes don’t consider things outside of the significance levels they’ve seen on the test when making their judgement.

Recently, I listened to a podcast from Freakonomics Radio about drug trial. They explained how drugs industries skew sample sizes and use different types of bootstrapping in order to hit significant level so that they can sell the drugs. To me, this was another sceptical use of statistics.

I understand statistic is the way — and maybe the only way — to analyse data. I think, what makes me sad is how some people mis-use statistics to attain ‘significant’ results and ignore the fact that statistics are prone to errors, especially when we use them without having an understanding of what statistical significant means; or, even worse — just ignore that it’s just a method.

As human, we have a tendency to prove that we are right. In the scientific way, we use statistics as proof. In scientific way, we use statistic as a proof. My worry is, it’s easy to forget that statistics are just a tool — numbers and data — which need to be interpreted. The interpretation is the crucial part. What I often see, especially those who don’t have a good statistic knowledge, there’s a tendency to just take the number and interpret it. We sometimes forget to refer to the methodology of how that numbers come out, and let our biases dictate the interpretation.

Another important point — we can sometimes forget to take side factors (or co-variants) into consideration when drawing conclusions. Remember, we don’t live in an idealised world where we can control all factors in our environment.

I don’t want to sound too sceptical on statistics. My point is, statistic is just a number. It is a simply a tool.

Imagine you’re cooking: statistics are the pans, grills, spatulas, mixers, and other cooking utensils you are going to use to produce the food. The final food depends on the how you treat the ingredients and the quality of the cooking process. When it’s good (or bad), it can be tied back to those two aspects. Two different food can look exactly the same but it tastes differently. Same here: an experiment can be significant, but whether the results are applicable in the real world is another question altogether.

PS. Thanks to @achishtie