Introduction to Hypothesis Testing

Irfan Rahman
Beginner’s Guide for Data Science
5 min readFeb 3, 2019

Every business problem needs hypothesis testing. Turning our idea into a testable hypothesis will help us to validate the problem before wasting time and money building a solution to a problem that might not exist. Hypothesis by definition is a type of statistical inference which assesses the evidence provided by data about some claim concerning a population.

Now there are two competing statistical whenever you are constructing hypothesis testing in inferential statistics.

  1. Null Hypothesis(denoted as Ho)- It always presumed to be true until the data provide sufficient evidence that it is not. In Null Hypothesis we always believe that the claim is true and that’s the reason we always use equal sign in null hypothesis

2. Alternate Hypothesis(denoted as Ha)- This is what we have to support or prove.

Steps in Hypothesis testing-

a. Specify the Null Hypothesis(Ho)

b. Specify the Alternative Hypothesis(Ha)

c. Set the Significance Level (a)

d. Calculate the Test Statistic and Corresponding P-Value

e. Drawing a Conclusion

When to reject or fail to reject Null Hypothesis?

  • If p-value(p) ≤ significance level(α) -: reject the null hypothesis(Ho).
  • If p-value(p) > significance level(α) -: failed to reject null hypothesis(Ho) or accept alternate hypothesis(Ho).

Note- If you see above i wrote fail to reject null hypothesis, it does not mean the null hypothesis is true. That’s because a hypothesis test does not determine which hypothesis is true, or even which is most likely: it only assesses whether available evidence exists to reject the null hypothesis.

Type of Statistical Test-

T- test(also called as Student’s test)- It is a type of inferential statistics which is use to determine whether there is a significance difference between the mean of two given samples. In other word t-test compares two averages and tells you how they are different are from each other. T-test is used when population parameter .i.e. mean , standard deviation is unknown.

  • A large t-score tells you that the groups are different.
  • A small t-score tells you that the groups are similar.

Different types of T Test-

One Sample t-test- Use to compare sample mean with known population mean or some other meaningful, fixed value.

Independent Sample t-test- Use to compare two means from independent groups.

Paired Sample t-test- Used to compare two means that are repeated measures for the same participants — scores might be repeated across different measures or across time.

Test Statistics-

t = (x1 -bar - x2-bar) / (σ / √n1 + σ / √n2), where

x1-bar & x2-bar are sample mean of two different samples.

n1 & n2 are sample size of two different samples.

Z Test-

It is a statistical test where normal distribution is applied and is basically used for dealing with problems relating to large samples n ≥ 30(n = sample size). It is use to determine the sample drawn belong to same. Unlike t-test, Standard deviation must be known to calculate z-test.

Different types of z-test-

One Sample z-test- The one-sample Z test is used when we want to know whether our sample comes from a particular population

Two Sample z-test- A hypothesis test that is used to compare two sample groups to determine if they have originated from the same population.

Hypothesis for Z Test-

Null Hypothesis(Ho)- Sample mean (x-bar) = Population mean(μ)

Alternate Hypothesis(Ha)- Sample mean(x-bar) != Population mean(μ)

Test Statistics-

Z = (X-bar — μ)/ (σ/ √n), where

X-bar = Sample mean

μ = population mean

σ = population standard deviation

n = sample size

ANOVA -

ANOVA stand for “Analysis of variance”. It is a statistical technique that is used to check if the means of two or more groups are significantly different from each other. ANOVA checks the impact of one or more factors by comparing the means of different samples.

When we have only two samples, t-test and ANOVA give the same results. However, using a t-test would not be reliable in cases where there are more than 2 samples. If we conduct multiple t-tests for comparing more than two samples, it will have a compounded effect on the error rate of the result.

1. One-way ANOVA- It is used to compare the difference between the two or more samples/groups of a single independent variable.

2. MANOVA(Multivariant Analysis of Variance)- It allows us to test the effect of one or more independent variable on two or more dependent variables. In addition, MANOVA can also detect the difference in co-relation between dependent variables given the groups of independent variables.

Hypothesis in ANOVA -

Null Hypothesis- All sample means are equal.

Alternate Hypothesis- At least one of the sample mean is significantly different.

Chi-Square Test-

The Chi Square statistic is commonly used for testing relationships between categorical variables. The null hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in the population; they are independent.

Hypothesis in Chi-Square Test -

Null Hypothesis- Both the variable x and y should be independent.

Alternate Hypothesis- Both the variable x and y are not independent.

Test Statistics-

χ² = Σ(O — E)²/E, where

χ² = Chi Square

O = observed value

E = expected value

Hypothesis testing problem Example-

In recent year the mean age of all the college students in a city has bean 23. A random sample of 42 students revealed a mean age of 23.8. Suppose their ages are normally distributed with a population standard deviation of sigma = 2.4. Can we refer at significance level(α) = 0.05 that the population mean has changed.

Here,

n = 42, x-bar = 23.8 , σ = 2.4, α = 0.05, μ = 23

if you see the problem standard deviation(σ) is known so we will go with z-test.

Test Statistics-

Z = (X-bar - μ)/ (σ/ √n) = (23.8 - 23)/(2.4/ √42) = 2.16

P- value(z=2.16) = 0.03

p-value(0.03) < α(0.05)

Hence, we reject null hypothesis(Ho).

Summary-

In this post we covered Hypothesis testing, Null Hypothesis, Alternate Hypothesis, Steps to write Hypothesis, When to reject or fail to reject null Hypothesis, Statistical test i.e. T-test, Z- test, ANOVA test, Chi square test.

I hope this post will help you understand above concept. Hit the clap if you found this post helpful and follow for the upcoming post.

--

--