Statistics 101: Hypothesis Testing and p-value — What’s the fuss about that!

Rohan Bali
Analytics Vidhya
Published in
4 min readMay 30, 2020

Before answering what is hypothesis testing let’s answer why hypothesis testing!

Let’s say Jack owns a shopping website and he needs to see what impact does it have on his website when he tries to make some graphical enhancement on his website. Now, how will he be able to test this? Hypothesis testing will come to rescue Jack from this.

Hypothesis testing allows us to find a structural path to take business decisions with statistical evidence to conducts experiments/tests on various aspects of a business.

Hypothesis testing adds meaning to our findings. We begin by creating two hypotheses known as the NULL hypothesis and the ALTERNATIVE hypothesis. Now, how to define the above two hypotheses? It’s very simple; Consider that the environment you are working in is completely fine, no changes are being done to it, all the old standards are applied and hence it is very stable. This is the NULL hypothesis. In Jack’s case, the NULL hypothesis is that there is no effect on his website by the graphical enhancement(All the money that he paid to the UI designer is wasted! poor Jack). Now, the ALTERNATIVE hypothesis says that the changes made have some effect, things are not the same as before. For Jack, this might be a piece of good news, if the alternative hypothesis is true then this would prove that the changes made to the website do have an effect on the traffic, now that’s positive or negative change is a different discussion (However, even for that Jack can conduct a hypothesis testing!).

Now, we know why we do hypothesis testing, what it is, and how to determine the NULL and ALTERNATIVE hypothesis. The next part is how to evaluate the above. Which hypothesis is TRUE and which is FALSE. Well, for this we have significance tests that come in handy.

The p-value(probability-value) is used to tell us about the significance of a hypothesis. Based on the p-value we either accept or reject a hypothesis.

But, what is a p-value? To understand the p-value we need to understand normal distribution. I will write a separate blog on it. To give you guys a brief about normalization; Normal distribution is a distribution of data in such a way that the mean, median, and mode coincides. Half of the values are on the right side of the mean and half of them are on the left side. It forms a bell-shaped curve.

The p-value is the total probability that is covered by the red-shaded region.

We need to set a significance level after setting up the null and alternative hypotheses. The significance level is denoted by alpha(the greek symbol). Generally, a 5% significance level is set. The significance level gives us evidence to support or reject the hypothesis. If the p-value is greater than 0.05 we can’t reject the Null hypothesis otherwise we can.

The p-value is basically a conditional probability. The conditional probability is the probability of a ‘Y’ event to occur knowing event ‘X’ has already occurred. Read more about conditional probability over here.

Let’s now help Jack with his problem.

Null Hypothesis: There is no change is the traffic on his website after graphical enhancements.

Alternative Hypothesis: The traffic has increased on his website after graphical enhancements.

Significance Level: This is set at 5%(0.05).

Now, Jack needs to calculate the p-value.

Jack will now take a sample from the population. After getting the sample, we then need to find the sample mean, in this case, it will be the mean time spent by people on his website.

P(Mean is greater than the sample mean| Null Hypothesis is true): if this is greater than 0.05, we cannot reject the null hypothesis.

P(Mean is greater than the sample mean| Null Hypothesis is true): if this is less than 0.05, we must reject the null hypothesis.

Now, this will give enough evidence to Jack to address the issue of the change in traffic on his website is due to the graphics change or not!

Just to provide you with more theoretical aspects of the hypothesis, there are mainly three types of hypotheses. The one which was explained above is known as Statistical hypotheses. The other two are Research and Substantive hypotheses.

The research hypothesis is the assertation made by a researcher regarding the outcome of the study. This is established by the researcher before initiating the experiment.

The substantive hypothesis tells us about the effect of the change in itself. It doesn’t take the use of a p-value. It is used to determine the estimated effect size change. It tells us about the level of change we are dealing with. Read more about substantive results here.

Well, that’s it!

Next up:

Statistics 101: Sampling Techniques-It’s gonna be a true representation!

Previous Blog:

Statistics 101: Grouped and Ungrouped Data- Let’s talk with data!

--

--

Rohan Bali
Analytics Vidhya

Data Analytics professional with majors in Computer Science Engineering. Enjoys problem-solving and propelling data-driven decisions.