ANOVA Mathematics Explained

Mehul Gupta
Data Science in your pocket
4 min readAug 10, 2019

--

Last time, after an elaborate discussion on Hypothesis Testing, we will be jumping on to ANOVA(Analysis Of Variance). But before understanding ANOVA, we must understand why we need it and how it is different from Hypothesis testing!!

Taking reference from my previous article, we considered a situation

In the above situation, let's add up some more terms and conditions:

Whether season(Summer,Winter,Rainy) has influence on weight?

Whether Season & Financial status of person influenced the weight gain?

Now, such cases can’t be handled using Hypothesis Testing:

In 1st case, The Independent variable has sub-categories(earlier we were considering Season (without sud division in summer,winter & rainy))

In 2nd case, we have got 2 Independent variables(Season & Financial Status)

Here comes ANOVA to the rescue(Thank God😇😇)

1st case represents One Way ANOVA & 2nd case represent Two Way ANOVA

In this article, we would be deep diving into Way ANOVA and an intuitive explanation of Two Way ANOVA.

Let's point out the steps to follow:

  • Decide your Null & Alternative Hypothesis
  • Set significance level
  • Calculate Total Mean and sub-category-wise mean as well(will be covering shortly)
  • Calculate the Total Sum of Squares Within(SSW) & Total Sum of Squares Between(SSB)
  • Calculate the degree of freedom for SSW & SSB
  • Calculate F-Statistic
  • Using the F-distribution Table for deciding on the Null Hypothesis

No worries will be discussing every term below😀

MANY NEW THINGS TO LOOK UP!!

Again we need a situation first to make proceedings.

Let us consider we are made available with the season-wise weight of 3 beings. We need to figure out whether Different seasons influenced weight or not.

Null Hypothesis:Mean weight for different seasons is same

Alternate Hypothesis:Mean weight for different seasons isn’t same(It might be the case that mean for two seasons is same, but not all)

Also, set your significance level as alpha=0.1(your choice)

If you aren’t comfortable with the above concepts, check here

STEP 1 DONE!!

Now, some maths😊

Mean for entire data=360/9=40 (kindly calculate and cross-check)

Before moving on, do calculate the season-wise mean as well:

Winter_Mean=60/3=20

Summer_Mean=120/3=40

Rainy_Mean=180/3=60

Moving Ahead…..

Calculating Total Sum of Squares Within:

LOOK CAREFULLY……..

It is the summation of the difference in value and means of the particular group

SSW= (30–20)²+ (20–20)²+ (10–20)²+(50–40)²+(30–40)²+(40–40)²+(50–60)²+(60–60)²+(70–60)²=600

Like here, the first 3 terms are from Winter(hence mean of winter i.e 20 has been subtracted, and likewise for other terms as well)

Now, Calculate the Total Sum of Squares Between:

AGAIN PAY ATTENTION…….

It is the summation of number_of_elements per category x (group_mean -over_all_mean)²

SSB=3x(20–40)²+3x(40–40)²+3x(60–40)²=2400

Here, 3=Number of rows per season

in the 1st term:(Winter_mean-Overall_mean)² and likewise of other terms represent Summer & Rainy season terms.

Now Calculating Degree of Freedom for SSW & SSB:

DOF for SSB=M-1=3–1=2

DOF for SSW=M x (N-1) =3x(3–1)=6

where,

M=No. of sub-categories(3 in our case)

N=No. of values/category(3 in our case)

Now, Calculating F-Stat:

F-Stat=(SSB/DOF_SSB)/(SSW/DOF_SSW)=(2400/2)/(600/6)=12

Left with the last step, Using the F-distribution Table

The F Distribution table has a set of tables given different significance levels.

Now follow the below steps to correctly interpret the F-distribution Chart:

  • Find the table corresponding to the significance level set(0.1 in our case)
  • The column DOF represents DOF_SSB & Row DOF represents DOF_SSW
  • The value corresponding DOF_SSB & DOF_SSW (in our case 2nd column,6th row), which is 3.46
  • As the F-Stat calculated by us is greater than the expected F-Stat interpreted from the F-Distribution table, hence:

The null hypothesis can be rejected.

FINALLY DONE!!! PHEW

Now a few words on Two Way ANOVA as well:

In two-way ANOVA, we have two or more independent variables(season & financial status in our case) & we need to figure out their impact on the dependent variable(weight).In such a case, we need to define 3 null hypotheses & 3 alternative hypotheses (this number increases when the number of independent variables increase) to check:

  • Whether season influence weight?
  • Whether Financial Status influence weight?
  • Whether Season & Financial Status influence each other? (Do state your null & alternate hypothesis accordingly considering the problem to pose)

Done for the day!!

--

--