ADVANCED HYPOTHESIS TESTING

TINU ROHITH D
Analytics Vidhya
Published in
5 min readMay 7, 2020

Hey All, Let's get our hands on how to determine hypothesis and when to apply advanced hypothesis testing. For initial overview on hypothesis kindly have a look into my previous article, link pasted below

Yapp! Let's take off ,

CENTRAL LIMIT THEOREM:

It is applied in hypothesis testing to aid in calculating probability or chance. It states that “ As sample size grow sufficiently large, the sampling distribution will tend towards a normal distribution, even if the underlying population is not normal”

Implications: If sample size larger than 30 (n>30), you can always use a normal distribution as your test distribution.

Advanced hypothesis testing is applied on multiple sample tests

ANOVA TEST(Analysis of variance):

  • It uses variance to reach an conclusion about group means.
  • It determines the influence that independent features have on a dependent feature, where the features are used for regression purposes.

Note:

It is used when dependent feature is continuous and independent features are discrete.

Variance calculated in anova are:

  • Within group variance (SSW)
  • Between group variance (SSB)
  • Overall variance (SST)

Test stat for Anova:

F -stat = MSB / MSW where, MSB = Mean square between, MSW = Mean square within, MSB = SSB / degree of freedom(B), MSW = SSW / degree of freedom(W), Degree of freedom(B) = k — 1, Degree of freedom(B) = n — k

Mathematical formula:

Where, nj = sample size from group j, xj = sample mean from group j, x = grand mean (mean of all data values), xij = ith measurement from group j .

In Anova:

Null hypothesis would be that all means are equal

Alternative hypothesis would be at least one pair of means are unequal

Limitations: It will tell you that at least one pair of means are unequal but not which group means were not similar.

TYPES OF ANOVA:

  • ONE-WAY ANOVA: It is used to test whether there is any significance difference in means of two different groups. With a one-way, you have one independent variable affecting a dependent variable.

Examples:

  1. From a sample of individuals age , we study the age of people from a feature having discrete groups such as young, adult, old and newborn.
  2. From a sample of liquor price list obtain from various store, we study the price of alcohol from a feature having discrete groups such as whisky, rum, vodka and beer.
  • TWO-WAY ANOVA: Here, you have two independent factors affecting a dependent feature.

In Two- way anova 2 null hypothesis and 1 alternative hypothesis are tested,

  • H01: All the groups from independent factor one have equal mean stress
  • H02: All the groups from independent factor two have equal mean stress
  • H03: All the factors are independent , no interaction between them

Examples:

  1. From a sample of individuals BMI (Body mass index), we study the bmi ratio across the people from weight and height factors. Where , Weight factor could be splitted into normal, obese and overweight. Height factor could be splitted into dwarf, short and tall.
  2. To study the anxiety level across the people from income and gender factors. Where , Income factor could be splitted into normal, high,and low. Gender factor could be splitted into male and female.

CHI-SQUARE TEST:

  • Testing differences using frequency.
  • It is used for multiple sample tests when dealing with count or categorical data.
  • Limitations: The dependent variable outcome is a frequency count
  • It is an asymmetric distribution.

NOTE: The idea is to check the difference between what you see in your sample v/s what you expected in your sample and then assess the chances of seeing that difference purely by chance.

A chi-square test uses these “observed” and “expected” frequencies to generate a conclusion about the statistical significance of the observed differences.

Formula Chi-square:

A chi-square distribution depends on sample size, as sample size increases chi-square tends to be normal

Types of Chi-square Test:

  1. Association Test: To determine if there any association between two variables. Null hypothesis is that two variables are not associated, they are independent. Alternate hypothesis is that two variables are associated, they are dependent. Example: To check if the preference for a brand changes as age changes or is there any association between age and brand fitness
  2. Goodness of fit: Very popular use of chi-square, used to determine if the data follows a particular distribution or not(non-parametric test). To check any difference between observed and expected value. Example: Is winnings are directly proportional to number of 6’s obtained from a dice rolled in a game of casino.

Note: The variance of population will follow chi-square and the mean of population sample will follow normal distribution

CONCLUSION:

Here, we overviewed across initial knowledge to determine hypothesis and when to apply advanced hypothesis testing. Thankyou and best wishes to you fellow people for reading this article. Appreciate it!

--

--