Comprehending the p-value in simple English with examples!

Chesta Dhingra
Geek Culture

--

As far as in my experience learning and understanding about p-value was a difficult task it takes a while to get hold on this particular concept. That’s why I thought to write about it and try to explain as much as I can in simple words with some relatable examples.

The concept of p-value is an essential part of inferential statistics. As we make our inferences on sample data and not on the entire population. Inferential statistics communicate that we are actually estimating something that one can’t measure directly. Take an example of Covid-19 vaccination trials, there we have estimated the effectiveness of it on certain number of samples and not on entire population while doing the trials.

While estimating the results there is some probability of errors and those significant results that we are observing can happen because of some chance factors, for example sampling errors and some random things that might happen. So, to quantify our results and to show how confident are we in doing the estimation of our results from the sample data that’s when p value comes in. p-value is defined as “probability that results at least as extreme as those obtained in an analysis of sample data are due to chance.” This further can be explained as probability of obtaining the observed difference (or larger one) in the outcome measure, given no difference exists between treatments of population. The p-value is the probability value, and it lies between 0 & 1.

Let’s take an example of a Drug X which helps in reducing the weight here we are doing an experimentation to check the effectiveness of this drug. First, we’ll take the random samples and divide them into two Groups groupA and groupB. GroupA will be provided with the placebos or no active ingredients and known as controlled group whereas groupB has been provided the Drug X. Starting with the experiment we’ll observe individual weights of both the groups and will continue it till day30. Supposedly we found that at the end of experiment average difference within the groupA is 0 whereas in groupB average weight difference is of 1kg.

If we want to make generalization of the above results to the population first, we need to think whether the difference between the weights of people having placebo effect and taking the drug X is the same!

Here we’ll define Null Hypothesis which states that difference between the groups who receive placebos and drug X would remain the same. Then we’ll define if null is true then what would be the chances of observing the weight difference of 1kg in those group who have received drug X (group B) from the sample compared with the placebo group(groupA). To find out the results of the null hypothesis we’ll be using statistical tests like t-test, anova etc. to determine our p-value. And the smaller the p-value stronger the evidence against the null hypothesis.

If the p-value for the above example comes out to be 0.02 or approx. 2% one can define the results as follows: -

If null hypothesis were true (the two-population means are equal) then there is 2% chance of observing large (or larger) difference that we have measured in our sample. In a plain way we can further elaborate that, weight difference among those who received drug X is same as the weight difference who received placebo, but there is 2% chance of observing a weight loss of 1kg (or more) between the sample groups.

This 2% chance can happen because of sampling error or random noise. While doing the random sampling there might be chances some external factors that might influence the results for example the samples that have been collected for group A have not effective high metabolic genes that leads to no further loss in weight whereas the samples that have been collected for group B have people who had high metabolic genes and leads to faster reduction in weight. And this random sampling happens purely on coincidence, and it would have an impact on the p-value.

To conclude the article have well explained the p-value based on the Drug example and see that the p-value ranges from 0 to 1, when p-value is closer to 0 we have strong evidence against our null hypothesis. And p-value helps in observing the probability of obtaining the difference (or a larger one) in the outcome given that no difference exists between treatment of two populations.

I hope you enjoy reading it. Do provide your critical feedback and please do follow me on medium for more such data science related topics.

--

--