Fixed Effect vs Random Effect ANOVAs (There’s a Difference)

Jacob Mazurkiewicz
5 min readJust now

--

Source: comparison of population means with Anova table (estamatica.net)

When we’re first introduced to the Analysis of Variance test in introductory statistics classes, we usually assume that the formulas given to us are the only ones we’ll ever have to use. Recently, I found out this is not the case.

The Analysis of Variance test, or ANOVA, generally tests the differences in means between varying categories. These categories can be levels of factors in a study, for instance, the dosage of a new pharmaceutical drug.

The F-statistic calculated from an ANOVA will compare the between group sum of squares to the within group sum of squares. If the between group SS is significantly larger than the within group SS, the resulting F-statistic will be large. Large F-statistics yield significant results when calculated with categories — 1 numerator degrees of freedom and #observations — categories denominator degrees of freedom.

The basic ANOVA test assumes that the factors being tested are fixed, as in, the levels of the factor are the only ones of interest. But the world of ANOVAs extends much deeper than this simple assumption.

Random Effects

Suppose we wanted to test whether the mean acceleration in sports cars differs from one another. This scenario calls for an ANOVA to determine whether the point estimates we sample are significant.

We decide to test the acceleration of 10 sports cars from 3 makers: Porsche, Mercedes, and Lexus. We then want to conduct the ANOVA to test whether there is a significant difference between the mean accelerations of sports cars.

Photo by Michael Jasmund on Unsplash

In a fixed effect model, our research question would be whether acceleration differed between these 3 brands.

In this case however, the factors are random, not fixed. This is because the research question equates to whether there is a difference in mean acceleration between sports cars in general. The three brands chosen (Porsche, Mercedes, Lexus) are merely random samples from a population of many brands of sports cars, and we want our findings to generalize outside of these three specific makes.

In cases like this, we consider our effects random instead of fixed.

Implications

You might be wondering why it’s a big deal if the effects in an ANOVA are random or fixed, and the answer lies in how confident the test can be in rejecting a null hypothesis and determining significant differences.

If an ANOVA contains random effects, one must account for the sampling error that could be contained in taking a small proportion of a larger population. For the car example, we must consider the possibility that the 3 cars we chose for the study happen to have different accelerations by chance, when in reality, the larger population of sports cars might not contain significant differences in acceleration.

Because we are no longer only considering the effects at hand, but the broader scope of possible effects, our calculations fundamentally change.

The calculation for the significance effects in an ANOVA is:

Here, the F statistic is the ratio of the variability explained by the factor divided by the unexplained variance (error). The null hypothesis for fixed effect ANOVA tests for equality among group means:

For a random effect ANOVA, the new null hypothesis test for variability, rather than mean equality:

Why do we need to use a different null hypothesis? This is because, under a random effect model, we assume that the factor being studied (car make in this example) is drawn from a normal distribution of possible factors with a mean effect of 0. Therefore, averaging across all possible factors, the effect becomes 0.

With this in mind, the only way to test if there is significance for a given factor is to assess its variability. In a random effects model, it is the variability among the different levels of the factors that is of interest, rather than their means. If there is little variation in the factor, then there is no significant effect.

The F-statistic calculation then becomes:

For both fixed and random effect models, the MSE (denominator) estimates the variance σ². Now however, the numerator is estimating the variance rather than the mean residuals, with n0 being the average of the samples taken from each population.

The fundamental difference lies in the expected mean square (EMS). Under a fixed effect ANOVA, the numerator is the average residual of each factor level and the overall mean, which estimates the factor effect. For the random effect ANOVA, instead of measuring the difference in means, we measure the variance within level.

If there is large variance in the factor, the F-statistic ratio will be large, and under the correct degrees of freedom we can potentially reject the null hypothesis and declare a significant factor effect.

Conclusion

As I pursue more complex studies in data science, I am constantly realizing that I did not know as much as I thought I did. This is becoming especially true for statistics, where there is so much more depth to the classic tests we are taught in introductory courses. This article is an attempt to explain a recent piece of information I learned through my studies

This article highlighted the difference between two flavors of Analysis of Variance tests: fixed effect and random effect ANOVAs. The difference lies in whether the factors being studied are the only ones of interest, or whether they represent a sample of a larger population of interest.

Due to the assumptions of the resulting models, the calculations made to estimate the F-statistic numerator differ between the two, with fixed effect ANOVAs comparing factor means and random effect ANOVAs studying factor variance.

As I continue my academic and professional career, I will keep these findings in mind, and always question the validity of p-values, and whether the underlying tests that produced them were actually the appropriate ones.

If you have any comments, please let me know!

References

[1] Kleinbaum, D. G., Kupper, L. L., Muller, K. E., & Nizam, A. (1998). Applied regression analysis and other multivariate methods. Duxbury Press.

--

--

Jacob Mazurkiewicz

Data Science Student at Duquesne University. Data Science Intern, Undergraduate Machine Learning Researcher, and AI Ethics Fellow