All You Need To Know About Hypothesis Testing

Yash Patil
AlmaBetter
Published in
3 min readJul 4, 2021

Introduction

Hypothesis testing, which comes under inferential statistics, is an essential part of your data science journey. Hypothesis testing helps businesses with their data-driven decisions. This blog is all about hypothesis testing, different data scenarios where you would apply it, how to apply it, and many more.

What is hypothesis testing?

If you ever confronted anyone who has done writting his PhD thesis, you would know that thesis means a theory or a statement that has been proven. A hypothesis is described as making a claim on population parameters and proving it with sufficient evidence. Few sample claims are

  • Average selfies taken by women are more than men.
  • Customers with more than one mobile handset are more likely to churn.

Based on the claims above, we define null and alternate hypotheses.

Null Hypothesis

In inferential statistics, in general, the null hypothesis means there is no relationship between variables under consideration. For example, our claim that ‘average selfies taken by women are more than men,’ the null hypothesis is that there is no relationship between selfies and gender.

Alternate Hypothesis

The alternate hypothesis, as the name suggests, is the alternate of the null hypothesis. It is the part for which we collect the evidence that the rare probability of occurrence is true. For example, for our above claim, we will try to find evidence of rare probability to prove a relation between selfies and gender.

Test Statistics

After we define our null and alternate hypotheses, the role of test statistics is testing the validity of the null hypothesis. For example, if we test for a mean, we calculate the difference between the estimated mean by parameters and the hypothesis mean to produce evidence supporting a null hypothesis. Consider the following claim:

  • “ The average salary of a machine learning engineering is 100k.”
  • Null Hypothesis: mean = 100k
  • Alternate hypothesis: mean ≠ 100k

If we found that the estimated mean is either greater than or less than 100k, we will found a piece of sufficient evidence to reject the null hypothesis.

Significance Level

The decision of whether to retain or reject the null hypothesis was taken by significance level. The symbol α denotes the significance level. The value of α depends on the context, but the value of 0.05, 0.1 and 0.01 is more commonly used.

Type I Error

The selection of significance level often depends on the severity of Type I and Type II errors. When we reject the null hypothesis, but in reality, it holds. This type of error is called a type I error.

Type II Error

When we accept the null hypothesis, but in reality, the alternate is true. This type of error is called a type II error.

Right Tailed Test

If we consider hypothesis testing for the sample mean and our alternate hypothesis is that Ha > mean, then the test statistics we calculate considering a right-tailed test.

Left Tailed Test

If we consider hypothesis testing for the sample mean and our alternate hypothesis is that Ha < mean, then the test statistics we calculate considering a left-tailed test.

Two-Tailed Test

If we consider hypothesis testing for the sample mean and our alternate hypothesis is that Ha ≠ mean, then the test statistics we calculate considering a two-tailed test.

Single Population/Sample Hypothesis Testing

When you are given only a single population or a single sample, the hypothesis test you perform is one sample test.

The following chart shows different one sample tests in different scenarios:

Two Population/Sample Hypothesis Testing

When you are given only a two population or a single sample, the hypothesis test you perform is a two-sample test.

The following chart shows different two-sample tests in different scenarios:

--

--