Demystifying Statistical Analysis 4: The Factorial ANOVA Expressed in Linear Regression

YS Chng
4 min readSep 16, 2018

--

The one-way ANOVA was introduced in the previous part of this series, and I explained how the analysis can be done using linear regression, as well as illustrated how polynomial relationships can be tested with the help of contrast codes. But datasets often contain more than one categorical predictors, and a factorial ANOVA is required for such situations. This post uses the example of the 2×2 study design to illustrate how a factorial ANOVA can be expressed in a linear regression, but as the number of variables and groups increase, the regression equation also becomes more complex.

For those who are unfamiliar with the factorial ANOVA and just want to know how it is usually conducted in SPSS, Laerd Statistics provides a comprehensive step-by-step guide. Otherwise, I will be explaining about the test using the following regression equation, taking reference from the textbook “Data Analysis: A Model Comparison Approach” by Carey Ryan, Charles M. Judd, and Gary H. McClelland:

Ŷi = b0 + b1X1i + b2X2i + b3X1iX2i

When dealing with more than 1 categorical predictor, a factorial ANOVA is required. An analysis that has 2 categorical predictors with 2 groups each is known as a 2×2 factorial design, which produces 4 different groups in total. For example, in a comparison that includes Gender (Male vs Female) and Age (Kids vs Adults), the resulting 4 groups are (Male, Kids), (Female, Kids), (Male, Adults), (Female, Adults). In the case of 4 groups, 3 additional parameters b1, b2 and b3 need to be estimated, on top of the intercept b0. The new predictor X1iX2i represents the interaction between the predictors X1i and X2i, and a hypothesis test on the slope b3 reveal whether or not the interaction is statistically significant.

If dummy coding is used, Male may be coded as 0 and Female may be coded as 1 for Gender, while Kids may be coded as 0 and Adults may be coded as 1 for Age. However, the problem arises when we attempt to calculate the interaction codes. In regression, an interaction predictor is calculated by multiplying the values of the two predictors from which the interaction predictor is constructed. Based on dummy coding, the table below illustrates how the interaction codes are calculated:

Values of X1i, X2i and X1iX2i when dummy coding is used.

As can be seen, the multiplication of values from the two predictors results in (Male, Kids), (Female, Kids), (Male, Adults) all having a value of 0, and (Female, Adults) having a value of 1. The slope of X1iX2i then seems to be testing the hypothesis of whether or not the group (Female, Adults) is statistically different from the reference group, which in this case seems to be (Male, Kids). This ultimately does not test whether or not there is an interaction effect between the other two predictors.

Compare this with the use of contrast coding, where Male may be coded as -1 and Female may be coded as 1 for Gender, while Kids may be coded as -1 and Adults may be coded as 1 for Age:

Values of X1i, X2i and X1iX2i when contrast coding is used.

With this set of contrast codes, the slope of X1iX2i will then be testing the hypothesis of whether or not there is a difference in the differences of Gender and Age, i.e. if an interaction effect exists between the two predictors.

Clearly, the calculation of interaction terms can become quite complicated when the number of predictors and number of categories in each predictor increase. It is also easier to make mistakes having to ensure that the contrast codes used are orthogonal. The results are equivalent to running a univariate analysis in statistical packages, but statistical packages simplify the process through point-and-click.

The one-way ANOVA and factorial ANOVA pretty much covers the foundation of most group comparison analyses. Other types of analyses are often variants of these simpler ones, but the principles still remain the same. In the next post of this series, I will attempt to show how the ANCOVA (a slightly more complicated test) is actually just a variant of the factorial ANOVA, and is not so intimidating once you understand how it works.

--

--

YS Chng

A curious learner sharing knowledge on science, social science and data science. (learncuriously.wordpress.com)