The paired t-test and linear mixed models

10 min readSep 8, 2021

There are many methods for the analysis of data with multiple levels. However, there is still significant confusion about how these methods are related and how they are different (cf. Bell et al. 2019).

This is supported by my observation that even for the simplest case of multilevel data, i.e., paired data, very few has been written about how the classical paired t-test is related to linear models (with or without random effects).

Even though there are sources that state that the paired t-test is equivalent to a linear mixed model with random intercepts (see e.g. here, here or here), they lack a proper proof. Moreover, there seems to be a common misconception that you really need random effects for this equivalence and that a simpler fixed effects model is not sufficient.

In this article, we will show that the paired t-test is in fact equivalent to both a linear mixed model with random intercepts and a simpler linear fixed effects model with varying intercepts (“equivalent” in terms of yielding the same test statistic to test for the treatment effect). This not only means that you do not have to estimate a complex random effects model to emulate a paired t-test, but also that there are special situations in which a linear model with fixed effects yields the same results as a linear model with random effects! I hope that this insight and my detailed derivations can help practitioners to better understand how the different methods are related.

The Setup

Paired data typically arises when we measure a certain feature of a person (e.g., blood values) before and after a treatment. This is multilevel data, as the individual observations before and after the treatment (at level 1) belong to the same person (at level 2). This is also called longitudinal data, where the same individuals (or other units) are measured on a number of occasions.

In the following, we will consider n subjects, and denote the measured value of subject i before the treatment as

Similarly, we denote the measured value of subject i after the treatment as

The paired t-test now calculates for each subject i the difference

With the definition of

we can write the test statistic of the paired t-test as

If it holds that

with unknown μ and σ², then t-stat (2) follows a t-distribution with n-1 degrees of freedom (as you can read in any good statistics book).

We now go on to show that a linear fixed effects model with varying intercepts and a linear mixed model with random intercepts yield the same test statistic to test the treatment effect.

Fixed Effects Model with Varying Intercepts

With a linear model, we try to model the situation of the paired t-test as follows:

With this model, we are interested in the treatment effect β_1. The hierarchical aspect of the data is modelled by the fixed effects (or dummy variables) u_i. This means that each subject i can have their own individual intercept, i.e., their individual base level. Thus, we call this model a fixed effects model with varying intercepts. In matrix notation, the model (without error terms) looks like this:

We are now interested in estimating the treatment effect

and its standard error

in order to compute the t statistic to test

Our goal is to show that this test is exactly the paired t-test.

As can be read in any good statistics book, it holds that

Thus, we start by computing

With a little bit of effort, we can also compute the inverse

If you have your doubts, feel free to verify this result by multiplying both matrices.

We can now calculate

With the help of this result, we can compute

Using (1), we can see that this means that

So if we can show that

this would mean that the t statistic to test

is identical to the test statistic (2) of the paired t-test!

To calculate the standard error, we use the fact that

where σ² is typically estimated via

Here, p is the number of estimated parameters, in our case p=n+1.

Thus, we get

Hence, it is left for us to calculate

We can compute

and have to simplify

In the following, we analyse each part separately:

Analogously, we can show that

Hence, it follows that

and consequently

But this means that the test statistic to test

which is exactly the same as the t statistic (2) for the paired t-test.

Furthermore, (3) follows a t-distribution with n-p = n-1 degrees of freedom (under the classical assumption of normally distributed error terms). This means that the paired t-test is equivalent to the fixed effects model with varying intercepts (in terms of yielding the same test statistic to test for the treatment effect)!

Linear Mixed Model with Random Intercepts

In the next part, we try to model the situation of the paired t-test with a random effects model. If you are not familiar with linear mixed models with random effects, I recommend that you read up on this topic before continuing with this article.

We can formulate a linear mixed model with random intercepts for our situation as follows:

Even though this looks quite similar to the fixed effects model examined above, it is inherently different, as we assume that

We can rewrite the model such that we get the following formulation for each individual subject i:

As for the model with fixed effects, we assume

As in the last section, we are now interested in estimating the treatment effect

and its standard error

Given estimates

we can estimate α as

where

(cf. Laird and Ware 1982)

To get an estimate of α_1, we start by computing

With this, we can calculate

Further, we get

Putting together (4), (5) and (6), we obtain

This means that as in the case of the paired t-test and the fixed effects model, the treatment effect is calculated as

To get a grip on the standard error, we use the fact that

(cf. Laird and Ware 1982)

Using (5), this means that

This is the same expression as we had it for the fixed effects model above. This means that if the estimated standard deviation of the residuals is the same in both the fixed effects model and the random effects model, the test statistic

would be equivalent to (2) and (3).

Unfortunately, it is hard to show that the estimated standard deviation of the residuals is the same in both the fixed effects model and the random effects model. This is because there is generally no closed form solution for the estimated standard deviation of the residuals in a linear mixed model. Instead, the estimate is determined iteratively with REML (Restricted Maximum Likelihood). However, one can convince oneself of the equality of both estimates by running simulations.

This means that (2), (3) and (7) actually have the same value. Consequently, the paired t-test is equivalent to both a linear fixed effects model with varying intercepts (see above) and a linear mixed model with random intercepts (“equivalent” in terms of yielding the same test statistic to test for the treatment effect).

However, note that for linear mixed models it is generally not clear whether the null distribution of a test statistic such as (7) is t-distributed for any choice of degrees of freedom. This is also the reason why for example the lme4 package in R does not indicate p-values (for more information, see e.g. here).

Nevertheless, in our case where we have a balanced design and assume normally distributed responses, (7) follows a t-distribution and looking at the experimental design tells us to use n-1 degrees of freedom.

Additionally, there is another “problem” with linear mixed models. Especially with small datasets, one can run into the problem of a singular fit, i.e.,

This is purely an artefact of the numerical and iterative estimation process with REML, which is far less stable compared to estimation in “normal” linear models (with fixed effects only). Such a singular fit leads to results corresponding to an ordinary t-test (as opposed to a paired t-test).

Conclusion

In this article, we have seen that the paired t-test is equivalent to both a linear mixed model with random intercepts and a linear fixed effects model with varying intercepts. As linear mixed models with random effects are more complex to understand and estimate than linear models with fixed effects only, I would recommend using a linear model with fixed effects to emulate and interpret a paired t-test.

Additionally, it is interesting to see how both methods (fixed effects and random effects model) yield the same result! Of course, this is only the case because of the very special experimental setup with balanced and no missing data. If the data is not balanced, the model with fixed effects and random effects yield different results with regards to the treatment effect!

References

Bell, A., Fairbrother, M. & Jones, K. (2019). Fixed and random effects models: making an informed choice. Qual Quant 53, 1051–1074. https://doi.org/10.1007/s11135-018-0802-x

Laird, N. & Ware, J. (1982). Random-Effects Models for Longitudinal Data. Biometrics 38(4), 963–974.
https://doi.org/10.2307/2529876