Inferential Statistics 101 — part 9

Shweta Doshi

Published in

GreyAtom

10 min readMay 16, 2018

ANOVA (ANalysis Of VAriance)

Joey: Chandler, why do we use Analysis of Variance?

Chandler: It is a statistical test to determine if the means of several groups are equal or not. Though it is called Analysis of variance it is not used for analysing the variances. It is used to analyse the equality of means. However, the name is justified since it uses the variances as a measure for determining the equality of the means. Just like hypothesis testing we have a set of procedures to be followed for ANOVA. The first step in ANOVA is similar to that of hypothesis testing i.e. formulating the null and the alternative hypothesis.

Each observation can be written as:

Where,

μk=Mean of the kth group

k = Number of treatments

Each observation can be written as:

Where, yij is the jth observation in the ith group, μ is a common parameter or the common effect for all treatments, τi corresponds to the effect of ith treatment in the response and ϵij is the error term which follows a normal distribution with mean 0 and a constant variance (σ2) across all the treatments. The effect of treatment τi and the error term ϵij are our sources of variation.

So, essentially we will be checking if all of our treatment effects are zero are not.

The test statistic for ANOVA is F-statistic and it is given by

It follows a F distribution with k−1 numerator degrees of freedom and n−k denominator degrees of freedom. Let me give you some intuitions about the test statistic before proceeding further.

The sum of squares represents a measure of variation or deviation from the mean. In simple terms, it measures dispersion of data. The calculation of the total sum of squares considers both the sum of squares from the treatments and from the error term. It is given by

TSS = SST + SSE

As mentioned earlier, the variation or dispersion in the data (TSS) can happen due to two reasons.

SST represents the sum of squares due to the treatments. For example, assume that we have a piece of land in which we grow paddy, now we divide the land into three pieces and use three different fertilizers, one for each piece. So here the treatment is the fertilizer which we use. Due to our treatment, the yield produced by the tree pieces of land for different months may or may not be the same. And this variation is called as the SST, which represents the variation due to treatments.
At the same time, it is also perfectly possible that the yield produced within the same piece of land may be different for different months, this could be due to other factors like the amount of sunlight each piece of land receives, amount of water each piece of land gets etc. And this variation is called as the SSE which represents variation due to error.

We can convert the sum of squares into mean sum of squares by dividing it by the degrees of freedom. MSE (Mean Square due to error) is an estimate of σ2. MST (Mean Square due to treatment) is also an estimate of σ2 provided all the treatment means are equal. It means that if there is a significant differences in the treatment means, MST will be larger compared to MSE and it leads to larger F-statistic. If F-statistic is very large, then we can conclude that at least one of the treatment mean is different from other treatment means.

The next step in the hypothesis testing is to calculate the p-value. P-value is nothing but the probability of getting the observed value of test statistic (F-statistic), or value with even greater evidence against null hypothesis, if the null hypothesis of the study question is true.

To calculate the P-value, we need to find out the probability of observing the test statistic value (F-statistic) or value greater than that in the above theoretical distribution (F-distribution). If the probability value (P-value) is very high, it means that F-statistic value is small which tells that all treatment means are equal and they are from the same distribution. If the probability value (P-value) is low, it means that F-value is very large which indicates that all treatment means are different and they are not from the same distribution.

Let me explain it through an example which we discussed in out last meeting.

Assume that we are interested in determining if the average number of products purchased by people from Bangalore, Chennai, Coimbatore and Trichy are same. This question can be answered using an ANOVA. In this case the treatments are our cities, which has four levels.

So the null and the alternate hypothesis in our case are

So in order to perform the analysis lets sample 30 data points from each city.

F-statistic value = 14.826

p-value << 0.05

Since the p-value is very less than the general threshold of 0.05, we reject the null hypothesis, or in other words we say that we reject the fact that the average number of products purchased by people of Bangalore, Chennai, Coimbatore and Trichy are different.

Joey: So Chandler, if my understanding is correct we use a t-test to compare the means between two groups and an ANOVA to compare means between 3 or more groups.

Chandler: Yes Joey, you are absolutely right.

Joey: Then I have a question for you Chandler. Why don’t we use pairwise t-tests for comparing more than two means instead of conducting an ANOVA?

Chandler: I know that’s where you will be heading Joey, actually we can also use a t-tests to compare more than two means. But a t-test will increase the type-I-error when we do multiple comparisons on the same data. For example: if we want to compare four treatments and planning to use a t-test for it, we need to conduct it t-test six times for the same dataset i.e.

Test 1: Comparing groups 1 and 2

Test 2: Comparing groups 1 and 3

Test 3: Comparing groups 1 and 4

Test 4: Comparing groups 2 and 3

Test 5: Comparing groups 2 and 4

Test 6: Comparing groups 3 and 4

If you run a hypothesis test, there’s a small chance that you’ll get a bogus significant result due to Type I error (Read Type I and II errors blog to get more intuition about it). If you run thousands of tests, then the number of false alarms increases dramatically. For example, let’s say you run 10,000 separate hypothesis tests. If you use the standard alpha level of 5% (which is the probability of getting a false positive), you will be getting around 500 significant results — most of which will be false alarms. This large number of false alarms when you run multiple hypothesis tests is called the multiple comparison problems.

Joey: ANOVA can only conclude that all the treatments mean are equal or not. It may possible that only one treatment is different from the rest of the treatments. In that case, ANOVA will reject the null hypothesis. How to identify which of the treatments are different?

Chandler: We need to conduct post-hoc test after conducting ANOVA to find an answer for your question. There are so many types of post-hoc test. I have listed some of them below.

Bonferroni Procedure
Duncan’s new multiple range test (MRT)
Dunn’s Multiple Comparison Test
Fisher’s Least Significant Difference (LSD)
Holm-Bonferroni Procedure
Newman-Keuls
Rodger’s Method
Scheffé’s Method
Tukey’s Test
Dunnett’s correction
Benjamin-Hochberg (BH) procedure

Post — hoc test also undergo the procedure of multiple comparison which will increase the Type I error. But, all the tests mentioned above will tackle this problem through correction factor. Let us do a Bonferroni procedure to our above problem and check which group is having a different mean.

Bonferroni procedure is a multiple-comparison post-hoc correction. As discussed earlier in order to account for the multiple comparison error we divide the significance α by n, so the new significance value or the threshold value becomes α/n.

In this procedure we conduct independent two sided t-test for each pair of treatment and compare our p value with the new threshold α/n (0.0083).

From the table we can infer that Chennai is having a different mean as compared to other groups.

Chandler: Joey, one more thing, whatever tool we use in statistics, it always comes along with some standard set of assumptions. It is necessary to check the assumptions for each tool and its validity before using it. For example, the assumptions of ANOVA are

The dependent variable is normally distributed in each group that is being compared.
Population variance in the each treatment is equal.
Independence of Observation.
Treatment effects are additive.

Joey: What will be the consequence, if any one of the above assumption do not satisfy our case?

Chandler: Okay, if our data didn’t satisfy the first assumption, the F- test is not applicable. If we apply the F-test for non-normal data, it will increase the Type-I-error. So, any conclusion we make through the results of ANOVA are valid only if our data satisfy the above assumptions.

Joey: Chandler, what is the difference between regression and ANOVA?

Chandler: (Please read the regression blog to know more about regression analysis) ANOVA is a space case of regression where the independent variables are in nominal scale. Nominal variables are the ones that have two or more levels, but there is no intrinsic ordering for the levels. For example, gender is a nominal variable which takes three levels namely, Male, Female and transgender. If we take a closer look at this variable then we find that there is no intrinsic ordering for these levels. The ANOVA equation is given by

The τi in the above equation is an independent variable. yij is a dependent variable.

Joey: Is there any other types of ANOVA?

Chandler: Yes Joey, depending on the number of independent variables, we can classify ANOVA into three different types.

One way ANOVA: It can be used when we have one independent variable, like the above example which we had where the independent variable was the city of the customer. Let’s take another example, suppose a professor wants to know if the performance of students differ based on student’s board of education in school (assume that it has three categories: State Board, CBSE and Matriculation) then we use the one way ANOVA.
Two way ANOVA: It can be used when we have two independent variables. For example, suppose the professor wants to know if the performance of students vary according to their gender (which has three categories Male, Female and Transgender) and the board of education in school (assume that it has three categories: State Board, CBSE and Matriculation) .
Three way ANOVA: It can be used when we have three independent variables For example, suppose the professor suspects that student performances depend on gender (which has three categories Male, Female and Transgender ), student’s board of education in school (assume that it has three categories: State Board, CBSE and Matriculation) and place they belongs to (assume that it has only two categories: Urban and Rural)

More than three independent variables case is rarely used because it increases the complexity involves in the interpretation.

MANOVA is just an ANOVA with more than one dependent variables.

Example: Professor wants to know if there is any statistically significant difference in student performances and class participation based on the type of course (assume it has two categories: online and offline lectures). Here student performances & class participation are the dependent variables.

Repeated Measure ANOVA is the same as ANOVA except that we use the same participants or samples over all the treatments.

Example: If the professor hypothesizes that student performance scores may improve if his teaching assistant conducts tutorial class, then in this case we find that the same set of students are subjected to the two levels of the treatment.

Analysis of covariance (ANCOVA): Usually, we can diminish the effect of confounding variables (or covariates) on dependent variable by controlling the design of experiment. If we fail to do that during an experiment, still we can diminish the effect of confounding variables through ANCOVA. ANCOVA is just an ANOVA with an additional strategy. It controls the confounding variables during analysis.

Example: Professor wants to know if there is any difference in the student performances based on students’ board of education in school (assume that it takes three categories: State Board, CBSE and Matriculation), when we control for gender (which takes three levels: Male, Female and transgender).

The author of this blog is Balaji P who is pursuing PhD in reinforcement learning at IIT Madras

Quora- www.quora.com/profile/Balaji-Pitchai-Kannu

Inferential Statistics 101 — part 9

Written by Shweta Doshi