“P-value”

Pranjal Vaidya
The Startup
Published in
6 min readJan 17, 2021

--

P-value is a small statistical tool with an enormous power! It is used for any new drug approval, therapy approval, or in short, in almost all the research as a validator tool. What exactly does this value mean?

After spending a considerable amount of time in academia and even though I use the term “is it statistically significant?” regularly, I get confused about the actual meaning behind the p-value.

What is the p-value?

I have come across multiple people perfectly saying the book-written definition of the p-value but unable to explain what it actually means in regular English vocabulary. Guilty of being one of them, here’s a small attempt to see if I can explain the meaning in the simplest language.

The easiest wrong interpretation of the p-value is that it tells us about the percentage that your results are due to a random chance (i.e., happened by chance). Prof. Goodman says, “Almost all the people think it gives some direct information about how likely they are to be wrong, and that’s definitely not what a p-value does.”

Big Data Jobs

The actual definition of p-value is slightly more confusing. Before processing what it tells us about, here is the list of what it does not tell us about-

  1. P-value doesn’t measure if the results are right or wrong.
  2. P-value can’t tell us the magnitude of an effect.
  3. P-values can’t tell us the strength of the evidence.
  4. And most importantly, it doesn’t tell us the probability that the finding was the result of chance.

I came across one of the very nice and easy interpretations of the p-value from that article by Andrade et al. -

Imagine conducting a randomized controlled trial (RCT) that compares a new antidepressant drug with a placebo. At the 8-week study endpoint, you find that 60% of patients have responded to the drug, and 40% have responded to placebo. The Chi-square test yielded a P value of 0.04 (BRAVO! it’s less than 0.05, SUCCESS!). You conclude that significantly more patients responded to the antidepressant than to placebo, and the new antidepressant drug truly has an antidepressant effect. Is it really, though? The conclusion is correct, but iffy. A P-value, even statistically significant, does not determine the truth!

Trending AI Articles:

1. Preparing for the Great Reset and The Future of Work in the New Normal

2. Feature Scaling in Machine Learning

3. Understanding Confusion Matrix

4. 8 Myths About AI in the Workplace

So, what are the right conclusion and the right interpretation?

Imagine that the null hypothesis is true; that is, the new antidepressant is no different from placebo.

Now, if we conduct 100 RCTs that compare the drug with the placebo, the results definitely won't be identical each time. In one RCT, the drug would outperform the placebo. In another one, the placebo would perform better than a drug. Additionally, the magnitude by which they are outperforming each other would vary from trial to trial.

So what exactly P = 0.04 (i.e., 4%) means here?

It means that if the null hypothesis is true (no difference in the placebo and the drug) and if you perform the study a large number of times and in the exact same manner, drawing random samples from the population on each occasion, then, on 4% of occasions, you would get the same or greater difference between groups (the drug vs. the placebo) than what you obtained on this one specific occasion.

However, we did not perform the RCT a large number of times. We performed it only once! BUT, on this one single occasion, we obtained results that would be considered rare. So, perhaps the finding is not really rare. This is possible only if the null hypothesis is false. Hence you reject the null hypothesis, and the results are statistically significant! That means we are saying the drug is actually working!

Is 0.05 a magical number?

If the null hypothesis is rejected (P < 0.05), why cannot we conclude that just as the drug outperformed the placebo in our study, the drug is truly superior to the placebo in the population from which the sample was drawn? The answer is that the P-value describes a probability, not a certainty. So, we can NEVER be certain that the drug is truly superior to placebo in the population; we can merely be rather CONFIDENT about it.

Next, imagine that instead of obtaining P = 0.04, you obtained P = 0.14 in the imaginary RCT test described above. In this situation, we do not reject the null hypothesis based on the 5% threshold. So, can we conclude that the drug is no different from the placebo? Certainly not, and we definitely cannot conclude that the drug is similar to placebo, either!! After all, we did find that there was a definite difference in the response rate between drug and placebo; it is just that this difference did not meet our arbitrary cut-off for statistical significance. So “not significantly different” does not mean “not different from” or “similar.”

Some of the recent researchers have started challenging the threshold and significance level of 0.05 for the p-value. Some of them also think instead of the words “statistically significant,” ideally, the p-value should be a continuous variable. That would be more appropriately describing the results that merely significance/ no-significance. Let’s see what future the p-values hold in the coming year in the scientific community!

As a wrap-up, here are the 12 most misconceptions of p-values according to the article by Goodman et al.-

  • If P = .05, the null hypothesis has only a 5% chance of being true.
  • A nonsignificant difference (e.g., P ≥.05) means there is no difference between groups.
  • A statistically significant finding is clinically important.
  • Studies with P values on opposite sides of .05 are conflicting.
  • Studies with the same P-value provide the same evidence against the null hypothesis.
  • P = .05 means that we have observed data that would occur only 5% of the time under the null hypothesis.
  • P = .05 and P ≤.05 mean the same thing.
  • P values are properly written as inequalities (e.g., “P ≤.02” when P = .015)
  • P = .05 means that if you reject the null hypothesis, the probability of a type I error is only 5%.
  • With a P = .05 threshold for significance, the chance of a type I error will be 5%.
  • You should use a one-sided P-value when you don’t care about a result in one direction, or a difference in that direction is impossible.
  • A scientific conclusion or treatment policy should be based on whether or not the P-value is significant.

Hopefully, this small gathered information got you all a bit closer to achieving statistical significance of understanding of the p-value :)

Coming up with the details about different statistical tests!

Don’t forget to give us your 👏 !

--

--

Pranjal Vaidya
The Startup

PhD candidate at Case Western Reserve University.