P-Value, A Statistician’s Buzzword

‘The P-value was never intended to be a substitute for scientific reasoning’ ~Ron Wasserstein

Aarsh Chaube
DataX Journal
6 min readJul 14, 2020

--

So, what is this p-value that scientists keep talking about in their research? why do we need them, and what are some of the gross misinterpretations that the majority of people come about.

Source

I’m not a math wizard, however, I’ll illustrate this with an example, bear with me. So let’s say I run a music academy and I have this monthly review which I hand out to my students. They fill it and submit back by the next day. Now, I found out that 52% of the children believed that if the music lectures had been a little more lengthy they’d get satisfaction for the money they’re paying. To which I had an instantaneous question, do more/fewer children believe that they’ll get satisfaction only if the lectures were a little longer?

That’s when Hypothesis Testing will come into play, the idea is to set up a Null Hypothesis and an Alternative Hypothesis wherein it says the following-
H0(null): yes, the proportion of children that feel the effect =0.52
Ha(alternative): there could be more or fewer children that feel so.

The next thing we do is setting-up a threshold commonly known as the Significance level(alpha), which we set as the researcher(for now alpha=0.05), which is to say how strong the sample evidence is against the null to reject it. Now, we take a review again and then calculate sample statistics(we got 0.56 as our sample proportion).

This is the point where p-values come to play, which say, hey, if we assume that the null is true, what is the probability for getting a sample with the statistics that we get. So, the p-value in our case would be the probability of getting the statistic at least this(0.56) far away from the original proportion(0.52), if the null were true. If that probability is lower than our significance level we tend to reject the null hypothesis, conversely if it’s higher we don’t say that we reject the null we say that we ‘fail to reject the null’ and that we don’t have enough evidence against it.

So, now that we’re clear with the concept, let’s talk about how the majority of people come about misconstruing the meaning and manipulating the value in order to get significant results.

Misconceptions And Data Dredging

Misconception #1: P-value is the probability of the null being true.
P-value is just a number representing how much evidence there is against your null hypothesis. The hypothesis testing framework doesn’t assign probabilities to the hypothesis. One other way to think about this is, it is a conditional probability of observing the statistic under the condition that the null is true. Assigning the probability to hypothesis is addressed in Bayesian Inference.

Misconception#2: Studies with the same p-values provide the same evidence against the null.

Source

Different observed effects can come to the same conclusions. For example, let’s say a drug company wanted to introduce two different medicines in the market. They implemented a cross-over study design for subjects and split the subject into two groups. Group 1 for pill A and Group 2 for pill B. After a week, the results of the two trials are back, resulting in one with a treatment effect of 8% and the other with 23%. These both have a p-value of,0.03, but the fact that these mean different things is clearly visible. While one might be a drug for diabetes the other might be for blood pressure.

Misconception#3: A value (p>0.05) means there’s no difference between the groups.
So, a non-significant difference doesn’t go in favor of the null completely.It’s more likely to say that we didn’t have enough evidence, maybe because of research shortcomings, maybe we didn’t have enough samples or maybe we’re testing two clinically different things.

Misconception#4:Statistical significance=Clinical relevance
Clinical relevance is determined by outcome, size of the difference.Maybe the difference is too small, how do we know about the magnitude of the effect.

Now is the time to talk about, yet another fascinating concept….

P-Hacking

Source

An instance of statistics gone wrong…, Hacking the p-value/data dredging means the same thing. So, When is a p-value hacked? essentially when the analysis is being chosen based on what makes the p-value significant, not what’s the best analysis plan, is when this happens. In other words, misusing the data to find patterns that can be presented statistically significant, even if there isn’t an underlying effect, thus dramatically increasing the risk of false positives(for more info(https://medium.com/data-science-community-srm/statistical-power-and-power-analysis-98cf4e10b064 ). This isn’t always malicious, it could come from a gap in a researcher’s analysis or a well-intentioned belief in a specific scientific theory or an honest mistake. When scientists p-hack they’re often putting research results that just aren’t real, and this could have some consequences, which could be small, like convincing people ‘that eating food rich in protein will certainly cause weight loss ‘, to very very serious effects like ‘using a non-authenticated medicine will be a sure short rescue to an X virus’.

One simple way to go around this is to correct for the inflation in your Family Wise Error Rate/type 1 error. One way to do this is to apply a

Bonferroni Correction
Instead of setting a usual threshold(alpha)=0.05, take the usual threshold and divide it by the number of tests you’re doing.

Source

more info, click here

Now that we’ve discussed enough the frequentist approach, let’s talk
a little about thinking Bayes.

A Brief Intro On Bayesian Approaches To Modelling

Source

Roughly speaking, when we think about problems in a Bayesian way we need to do the following:

  • Before we start modeling, we need to start with some belief about the situation, which is called a Prior. We add data to update these beliefs. Every parameter must begin with a distribution that captures our beliefs.
  • Collect data & model.
  • Update our beliefs using data to get what is called a Posterior.
  • Repeat steps 2 &3 using the posterior from 3 as our new prior.
Source

Is there any alternative top-value?

Bayes Factor, which offers evidence in favor of the null hypothesis.The ratio of the likelihood of one hypothesis over the other(we covered this under misconception1).

Conclusion

We’ve seen what the p-value is, it’s interpretation and some frequentist methods of analysis, I also gave a brief intro bayesian modeling. Whatever I discussed is not even a chunk of what inferential statistics is, all I can suggest is that while modeling cover both aspects(frequentist and bayesian) and then figure out for yourself which suits your analyses the best.

References

Goodman S. A dirty dozen: twelve p-value misconceptions. Semin Hematol. 2008; 45: 135–40.
doi: 10.1053/j.seminhematol.2008.04.003.

--

--

Aarsh Chaube
DataX Journal

Upcoming ML PhD at UoE | MSc Computer Graphics, Vision and Imaging @UCL | Ex-MLE @Unify