p-values — Difficult to make sense? Not anymore

Abu Abdul
3 min readMay 1, 2022

Concept of p-values had always remained a mystery for me for quite a long time. No matter how much I tried to get hold of this concept, p-values never made a sense.

Today, while reviewing some basic mathematics for machine learning, I got challenged by the same concept again. And after some digging, I found out that p-values is nothing but another name of probability of “chance”. Higher the p-value, higher the probability that your observation is only a result of chance, like a lottery, and you can not confidently claim that you can win it again.

Let’s elaborate the lottery example a bit further. Suppose you are a super genius and develop an algorithm that can predict a lottery namely LumboJumbo worth $100 million and you want to buy the ticket. But the price of each ticket is $100,000 and you don’t have that kind of money.

p-value: Chance vs. Algorithm, Null Hypothesis vs Original Assumption

You find a solution to your investment problem and go to your super rich childhood friend DD and try to convince him to invest in the lottery on the condition that both of you will share the equal amount after winning the lottery. DD is a bit skeptical given the amount of investment and asks you the winning probability. You explain to your friend that on trial basis you applied your algorithm for the last draw and you got the number right. Upon hearing this, DD says what if you got that number by chance, not as a result of super efficient algorithm. Obviously he is unaware of your capability and as well as of the power of machine learning. But here comes the concept of p-value.

You start explaining DD that you calculated p-value and if the p-value is 0.30 it means that there is 30% chance that the number you came up with was a result of chance. But if p-value is 0.05 or less, the chance is 5% or less that you got that number by chance. And alternatively, you can claim that there is a systematic algorithm in play to get that number and since it’s not because of chance, you can get the right number again.

Another concept that is directly related to p-value is of null hypothesis. For the sake of this article (and only this article in the whole world), null hypothesis is another name of “Chance”. Higher the p-value, higher the probability that you won by chance, more we are convinced that null hypothesis is true. Null hypothesis is just the opposite of your original hypothesis or the investigation you are conducting. Null hypothesis assumes that there is no systematic relationship between the variables under investigation.

In order to prove your original hypothesis that your LumboJumbo algorithm is working, you need to disprove null hypothesis and essentially require a low p-value. So your original hypothesis regarding effectiveness of your algorithm will have some weightage if you get an extremely low p-value (less than 0.05) and resultantly it will lead to the conclusion that your dependent and independent variables have some kind of relationship. That leads to the result that your independent variable (algorithm in this case) can affect the dependent variable (lottery winning number) without an element of chance.

In the end, if we get a low p-value we can say that we reject the null hypothesis or in other words event is not a result of a chance. And if get a high p-value, we can say that we fail to reject the null hypothesis. We can’t say that we accept the null hypothesis. Either we reject it or failed to reject it.

So next time you play a ticket, get your p-values right, reduce guessing, eliminate chance and lower the p-values of your guessing to make a systematic bet.

--

--