Demystifying Statistical Analysis 5: The ANCOVA Expressed in Linear Regression

YS Chng
4 min readSep 22, 2018

--

The previous part of this series showed how the factorial ANOVA can be expressed in a linear regression when more than one categorical predictors are required in an analysis. The factorial ANOVA introduced the concept of interaction effects, and explained why contrast coding is better than dummy coding when the analysis is conducted as a regression model. But interaction effects do not only exist between categorical predictors. At times, analyses may contain a mix of categorical and continuous predictors, and in such situations, an ANCOVA is required.

The term “ANCOVA” sounds big and scary, and gets loosely thrown around whenever we say there is a covariate in the analysis. After all, ANCOVA does stand for Analysis of Covariance. But what exactly does a covariate mean? If we refer back to the cheat sheet introduced at the start of the series (row header), you will notice that the analyses for independent variable do not differ between predictors and covariates. So why do we use these different terms for the same thing? The main reason for calling an independent variable a predictor as opposed to a covariate, is that the predictor is more often than not the star of the analysis, the main variable of interest; the covariate on the other hand, is usually an unwanted or unintended variable, that has been recognised to exist. To account for this variable, it is added into the analysis to observe if its inclusion makes any difference to the results. However, the process is no different from analysing multiple predictors. Since multiple continuous predictors are analysed through a multiple linear regression, while multiple categorical predictors are analysed through a factorial ANOVA, ANCOVA is mostly used to describe an analysis with a mixture of categorical and continuous variables.

Now that we’re clear about the concept of ANCOVA, those who just want to know how it is usually conducted in SPSS may refer to Laerd Statistics for a comprehensive step-by-step guide. Otherwise, I will be explaining about the test using the following regression equation, taking reference from the textbook “Data Analysis: A Model Comparison Approach” by Carey Ryan, Charles M. Judd, and Gary H. McClelland:

Ŷi = b0 + b1X1i + b2X2i + b3X1iX2i

The ANCOVA becomes a lot easier to understand and less intimidating, when we look at it through a linear regression, and are comfortable constructing interaction terms. As you can see, the regression equation can look quite similar to that of a factorial ANOVA, depending on the number of categorical predictors and groups in each predictor.

For example, in a comparison between “Male vs Female” respectively coded as -1 and 1 in predictor X1i, a continuous predictor X2i of “Age” may be added along with its interaction with “Male vs Female” calculated in predictor X1iX2i. Just like in the independent t-test, a hypothesis test on b1 still indicates whether or not the comparison between “Male vs Female” is statistically significant, but the coefficient no longer represents the actual difference between the group means; a hypothesis test on b2 reveals whether or not “Age” is a good predictor of Ŷi, just like in linear regressions with continuous predictors; and a hypothesis test on b3 reveals whether or not there is an interaction effect between “Male vs Female” and “Age”.

Just like the case of the interaction predictor in the factorial ANOVA, X1iX2i is calculated by multiplying the value of the coded predictor (Male or Female) with its corresponding value of “Age”. Similarly, contrast coding would work better than dummy coding.

Example values of X1i, X2i and X1iX2i when contrast coding is used.

Like the factorial ANOVA, the ANCOVA can get quite complicated when the number of predictors and number of categories in each predictor start to increase. Fortunately, most General Linear Models in statistical packages allow the addition of continuous covariates, which lets ANCOVA be conveniently tested in many different ways.

* * * * * * * * * *

The intention of the past few posts in this series is to draw the connection between the various statistical analyses. The independent t-test is a simplified version of the one-way ANOVA that only involves the comparison of two groups, while the one-way ANOVA is used when there are more than two groups; the factorial ANOVA is a more complicated version of the one-way ANOVA, where there are more than one categorical predictors, and interaction effects need to be considered; lastly, the ANCOVA is a variant of any of these group comparison analyses, with the addition of continuous predictor(s) and the interaction effect among these predictors. By showing how all these different statistical tests are related to each other, I hope that I have been able to demystify the use of statistics, and help make data analysis more approachable.

--

--

YS Chng

A curious learner sharing knowledge on science, social science and data science. (learncuriously.wordpress.com)