You Should Not Trust Statistics

And not because they are wrong

Louis Josso
ILLUMINATION
4 min readFeb 3, 2022

--

Photo from Lukas via Pexels

I’ve been studying mathematics for 10 years and working as a data scientist for more than 3 years now. I’ve never been as angry about statistics as in these pandemic times.

With the Covid epidemic, everybody became a statistician and derive his own results from some numbers you can find online. Don’t get me wrong, I totally encourage you to fact-check and do your own research, and have your own opinion.

What is getting me angry is public personalities who are listened to by thousands of people and sharing wrong numbers, analyses, or turning numbers in a way that they are saying the opposite that they should. And a number of them are doing that on purpose!

Here a the three main reasons why you should be careful of the statistics people are sharing :

Because Human is not good at estimating statistics

Let’s ask us a small problem. You may have heard of it but it is always a good example of how bad we are at probabilities :

In a class of 30 students, what is the probability for 2 students to have the same birthdate?

There is more than a 70% of chance that 2 students have the same birthdate! This is called the birthday paradox. And most of our misunderstanding comes from the number of pairs you are comparing. If you take the first student, you are looking at his birthday against 29 others, the second 28, etc so (29+28+27 + …. ) = 435 and we start here to understand why the probability is so huge !

More information can be found here: What Is The Birthday Paradox

Okay, how is this interfering with our understanding of basic questions in daily problems, Covid related, you would tell me? If I tell you that we can have an increasing percentage of Covid cases inside both the vaccinated population and the non-vaccinated people but still have an overall decreasing number of cases. Would you trust me?

Yet, this is a totally realistic scenario. Because everything depends on how many people are vaccinated and not vaccinated. Here is an example with some random numbers

Because Correlation is not Causation

This is the most difficult job for In Situ statistics. How to differentiate correlation, causality, and coincidence?

100% of the people who drink water will die!

Yes, and what? Here we can directly understand that one is not causing the other. But what happens when the data is more complex? Correlation is the power to understand that two numbers are evolving in the same way. Causation is understanding that one statistic is caused by another one.

If we want to take an example on that, we could talk about the Ice Cream Case:

When ice cream sales rise, so does homicide. So does this mean ice cream causes us to commit violent crimes?

There is a real correlation here between the homicide and the ice cream sales. Then, stopping selling ice creaming should reduce the homicide rate… This is without counting on the third variable which is the real cause of both of them, the temperature of the days we are looking at. Studies have proven that the hotter the day was, the more homicide there was.

xkcd.com n. 925

Because statistics run on hypothesis and they are not always told correctly

My primary school teacher always told us: You can’t compare potatoes and carrots! And thus, even daily, I often find myself comparing things that should not.

Two groups are way more difficult to differentiate than vegetables. If I take 200 people in the street and I separate them into two groups of 100 persons, are the two groups comparable? It should be yes. But what if in one group there are 60 females and only 40 in the other, are these two groups still comparable? We could presumably deny it.

Having more vaccinated people in the hospital than non-vaccinated people mean that you have more chance to go to the hospital when you are vaccinated?

Of course not. As we already discussed together, everything depends on the population you are looking at. If you want to compare this number you have to compare the percentage of people vaccinated in the population, and the percentage of people vaccinated in the hospital.

Trust the tool, not the messenger.

What do you like the next subject to be?
If you liked this article, please leave a comment and applause!

If you liked this story, you can check my latest post here:

--

--