Statistics: What is the ANOVA test?

Brain_Boost
3 min readJan 17, 2024

Introduction: What is it?

ANOVA stands for Analysis of Variance and it is used to compare means. We generally use a t-test when trying to find the difference of two means but when we have more than two groups we are comparing we have to use ANOVA. We calculate the variation within each of the groups and compare it to the variation in-between the means.

source-https://www.youtube.com/watch?v=WcmzS3nEUqo

The F statistic (is a ratio of two variances and is used to test the hypothesis that the variances of two populations are equal) is then calculated and compared to the f-distribution. From that we are able to calculate the p-value.

An Example….

To overall better understand this topic lets look at an example. Lets say we have data of how much money people make based on their qualifications. We would expect people with higher qualifications to make more money, so lets see how that data would like in a box and whisker plot.

source-https://www.youtube.com/watch?v=WcmzS3nEUqo

In the sample we can see that people with degrees tend to earn more money than those without degrees. The other means look similar to each other but there is also considerable overlap between all four. Now we want to find out if this difference in the sample means that there is a difference in population. Do we have a statistically significant result or could this have occurred by sampling variation?

Our null hypothesis is that the population means of the four groups are equal. The alternative hypothesis is that not all of the means are the same. At least one if different.

We use one way or single factor analysis of variance to find the f statistic and the associated p-value. The f value rounds to 15.5 and the p-value is 2.17E-09, which means 2.17 * 10^-9, which can further be simplified to 0.00000000217. The p-value is very small which must mean that the null hypothesis is false as discussed in my previous article linked here.

We would then use a post hoc test such as tukey test to determine which peers of means show significant difference.

source-https://www.youtube.com/watch?v=WcmzS3nEUqo

This table shows that all the pears of means show statistically significant differences except for the groups of school and vocational qualifications. Those don’t have evidence of having different means. When we get a p-value that is less than the specified level of significance(0.05) we declare that the result is statistically significant.

Assumptions of ANOVA

There are some assumptions under lying the ANOVA test:

  1. The samples must be independent
  2. The data is well modeled by a normal distribution
  3. The variances of the different groups are the same

If the assumptions were violated then we would need to use the non-parametric test: the kruskal wallis test.

The sample that we used does show some violation, as the variances are very different and so are the group sizes, making it more wise to use a kruskal wallis test.

If the kruskal wallis test is used it also produces a p-value less that 0.001 hence making our conclusion to be the same as before.

--

--