Who’s Afraid of the Big Bad Hypothesis Test?

James Cochrane
6 min readNov 28, 2018


Photo by freestocks.org on Unsplash

Many people seem afraid of the Big Bad Hypothesis Testing concept. It is something that baffles the mind, and most would rather watch paint dry than try to go through the process of learning (or relearning) it properly. You needn’t fear the concept any longer. If you have struggled with hypothesis testing in the past, give this article a try. I think it will solidify your understanding of the concept.

You probably remember the term hypothesis testing from your Statistics 101 class that most (if not all) students are required to take. It’s one of those classes you likely slept through. If you’ve been away from school for a while, you remember the term but don’t remember the concept.

Many people never use it during their careers, or if they do, they don’t realize they are doing so. When you think back to why you were required to take the class, you become agitated at the thought. What a waste of time and money!

But, guess what? The field of data science is growing in popularity, and you’re going to need to brush up on hypothesis testing and other statistical concepts. It’s going to be part of your new data science job.

And yes, you will use it!

If you have any recollection of hypothesis testing, you likely remember it was confusing. Even when you seemed to grasp it, you lost it when you tried to recall it away from class. The subtlety of the concept is what gives people trouble. You aren’t trying to find the probability that the null hypothesis is correct. You are finding the probability of observing samples that contain extreme values (away from the population mean) given the assumption that the null hypothesis is true.

This is a rather whacky concept when you think about it. The good news is it will become second nature if you are willing to muddle through it several times.

The biggest stumbling block with hypothesis testing is the whole p-value concept. The key to grasping the concept of p-values is to assume that the null hypothesis is true. Whenever you get confused, always go back to this assumption. Then, think about how likely would it be for your new sample to occur given the assumption that the null hypothesis is true. If the likelihood is small and that falls below a certain confidence level, you may reject the null hypothesis. Obviously, at that point, you don’t have to assume the null hypothesis is true anymore.

Let’s use an example to illustrate. Suppose an administrator of a university believes that current students enrolled in the school are taller than usual. The university has hesitated to fund a basketball team in the past since the average number of free throws by students is a dismal one basket. If a student made one basket, it is unlikely they would make two or more in a row after that.

Due to the increased height of the students, the university decided to hire several researchers to observe basketball free throw activity in the gym. Each researcher heads to the gym at different times during the day. At the end of one month, the averages are reported. Suppose at the end of these observations, these researchers reported that the average number of free throws is three baskets in a row. Thus, if a student makes one basket, he or she will make two more on average.

Rejecting the Null Hypothesis

Can the university claim that the taller students are better players and go forward with funding a basketball team? For this to happen, the administration feels it needs to reject the null hypothesis at a certain agreed upon confidence level. For this example, let’s assume a 95% confidence level which is common across many organizations.

This example is an oversimplification as playing basketball is much more than shooting free throws. However, the point is the sample mean of three baskets in a row is far away from the population mean of one basket followed by a missed basket. Because we are assuming the conditions of the null hypothesis are true, i.e., that the mean is the average of making one basket and missing the next, then how is it possible that we can have a sample distribution where the average student shoots three baskets in a row? Something is not right!

More About the P-Value

The p-value in this example is the probability (or the likelihood) of getting three baskets in a row given that the null hypothesis is true. This probability should be quite a small number when the null hypothesis is assumed to be true because the value lies on the outskirt or tail of the population distribution. If it were to be a common value in the null hypothesis universe, it should be much closer to the population mean. Otherwise, it wouldn’t be common.

Since the probability is tiny, then there is no way that our samples should be that extreme as frequently as they are. For example, suppose we calculate the p-value to be .01 or 1%. This means on average there should only be one student out of 100 who makes three baskets in a row when the null hypothesis is assumed to be true. But, our sample is showing we have many students who are making three baskets in a row. The three baskets in a row is the average of our sample distribution. Therefore, the null hypothesis universe is out of alignment somehow.

What this means is the status quo of making one basket and then missing the next is no longer valid since the samples are too frequent for them to be considered random noise. The status quo in this instance represents the null hypothesis, the null hypothesis is no longer valid, and the university can feel confident in rejecting the null hypothesis.

There are several factors to consider concerning the example above. The first is that we cannot accept the alternative hypothesis even when we reject the null hypothesis. It is not enough to declare there’s a new sheriff in town, so to speak. Unfortunately, we did disband the old sheriff and left “the town” sheriff-less. More studies are needed to determine if three baskets in a row is the new mean of the population. All we know from the above is that one basket is no longer the mean at a 95% confidence level (or whatever confidence level the university agrees upon).

Another consideration is that we didn’t have all the components needed to calculate a proper p-value in the above example. We would need the population standard deviation, and the university would have to publish the confidence level (90%, 95%, 99%, etc.) Although, it is usually acceptable to assume 95% when none is given as that has become a standard across most industries.

Further, just because the p-value does indicate statistical significance doesn’t mean the university will fund a basketball team. They may, but they instead may feel more data is needed. This first step could be a stepping stone for them to fund the gathering of that data. Another example of this is when a manufacturer is comparing two similar machines and finds a difference between them is statically significant, but customers may not notice any difference from the products produced from both.

What determines when the sample mean is close or far away from the population mean? That is the purpose of determining confidence values. The concept has to do with sampling error. No matter how hard you try, you cannot sample from the population without some amount of error. That means we have to allow for a certain amount of give and take. If the p-value is less than the threshold that determines statistical significance, then it’s not just random noise. The process accounts for the error.

To sum up, make sure you assume the null hypothesis is true and structure your probabilities with that in mind. The p-value represents the chance of the new sample mean occurring given the assumption that the null hypothesis is true. With this assumption, the sample observations should hardly ever happen. If the null hypothesis is true, then most observations from our sample should be on or near the population mean. That is the nature of the mean. The fact that our new sample mean is not close to the old mean (population) suggests that something is wrong with the old mean. The null hypothesis contains that old mean and therefore you can reject it.

James is a computer science turned data science geek who has 22+ year’s experience in the field. He is the owner of https://DataScienceReview.com which is a website to help people learn more about the field.