Introduction to Mixed Models in R

Making variance work for you.

Dr. Marc Jacobs
8 min readOct 6, 2021

The following post is a ‘simple’ introduction to Mixed Models in R using a dataset of the BW development of piglets. Each piglet was measured at four time-points — 0, 7, 14, and 21.

Mixed Models are extensions of traditional regression models by being able to model the explained and unexplained random parts of a model through the use of variance components. As such, they are splendid tools to deal with correlated and unbalanced data. A traditional outlet for Mixed Models are datasets that contain repeated measurements which is often referred to as longitudinal data if time is the key reason for repetition.

That is because there is variance within (correlated observations) and between observations. The level at which observations take place is key in these models. In this example, the level of interest is the piglet, which contains multiple observations. Such datasets are labeled as ‘nested’.

In this example, I will use the above described longitudinal dataset to build several models:

  1. Unconditional means model — includes only an intercept (the overall mean).
  2. Unconditional growth model or random-intercept model. The model allows piglet-level specific intercepts, but only a global single slope coefficient.
  3. Conditional growth or random intercept random-slope model. Each piglet is modeled as both a separate entity and as a member of the larger dataset. Hence, for each piglet a random intercept and random slope is modeled, but estimates on the global intercept and the global slope.

The ability of a Mixed Model to determine level-specific — here animal — coefficients by also looking at the global part of the model leads to shrinkage. This means that predictions coming from a Mixed Model are pulled toward the greater mean, which safeguards against overfitting.

Let’s get started.

Environment is cleared and several packages are loaded. The backbone of a Mixed Model in R is the lme4 package.
Data is loaded and transformed into long format. This means that multiple rows contain data belonging to the same ID. Mixed Models need this long-format in order to model longitudinal data.
The data clearly shows that we have four observations per animal. For simplicity sake, we focus only on treatment as a covariate.
Code to summarize data across time, and across treatment and time. In addition, we will already take a peak at the overall correlation and covariance of time-points. Remember, observations in time are nested, especially when they take place in the same individual. This will affect standard error estimates.
Enough variance, per treatment and per timepoint.
Clear correlation across time.
Code to plot data across time to get some idea what it is that we are dealing with.
Graphs show a clear difference between treatments, in mean and variance, across time.

The following code is a bit of a side-step, introducing longitudinal K-means clustering to asses if the data can be partitioned in different clusters. The below coding does this exercise, looking for 2 to 5 clusters. It is essentially an exploratory tool, but the results do hint that the data is not unique enough to belong to five treatments. Perhaps not even to four. In addition, it is clear that the clusters differ in intercepts and not so much in slopes. Hence, we have a hint at a random intercept model.

Plots show that partitioning of the data in 2 to 3 factors makes sense, and that there are clearly differences in intercepts and not so much in slope.
A simple linear regression model to see if time, treatment, and time*treatment make sense.
Residuals look good. Time makes sense. Time by treatment does not make sense. The K-Means clustering already hinted at that — time and treatment have a main effect. Not an interaction effect.

Of to our first model, the unconditional means model. It is actually quite simple. It has a single intercept, and a random effect (1|animal). The code (1|animal) tells the model that for each animal separately we want to estimate its intercept. Hence, a random intercept model.

Although we will include the random intercept, we will refrain from any other predictor except the grand mean in order to estimate the intracluster-correlation coefficient (ICC). The ICC is a simple ratio of explained variance / total variance. In effect, the ICC of an unconditional means model is a screening tool to see if a Mixed Model makes sense.

The grand-mean (intercept) is not a great predictor for the data.

The output above and below show a 6.4% variance in the total dataset that can be attributed to the variation between animals. This is the ICC. That is not a lot, and to be honest does not really foreshadow a whole lot of succes when applying a Mixed Model. But lets try anyhow.

> icc(model.i)
# Intraclass Correlation Coefficient
Adjusted ICC: 0.064
Conditional ICC: 0.064

The following model is an unconditional growth model. There is an intercept, a fixed effect of time, and a random intercept.

The output shows much less residual variance in this model by including the fixed effect of time.
The plots show the marginal prediction (averaged across all data points), the conditional prediction (per animal based on fixed and random intercepts), and the calibration of the model.

Next up is an unconditional polynomial growth model. The same model as the last one, except now we include a quadratic effect of time.

Adding a quadratic effect does not really add anything to the model. To the right, you can see the intercept only, linear, and quadratic effect models.
Code to assess which model is best — intercept, linear, quadratic.

The results favor the quadratic model, but not by overwhelming force. Also, if you look at the plots above, I would not be so tempted to use a quadratic model. Especially, since treatment was not even included yet.

  Model K      AICc Delta_AICc AICcWt Cum.Wt        eratio
3 Q 5 932.5382 0.0000 1 1 1.000000e+00
2 L 4 960.9260 28.3878 0 1 1.459925e+06
1 I 3 1445.8183 513.2801 0 1 2.866517e+111

The code below will look, at the animal level, which model is preferred. Overwhelmingly, it is the quadratic model, which can be explained by the absence of more important variables.

> head(results)
Animal Rsq_lin Rsq_quad Diff Bestfit
1 58833 0.9401456 0.9818319 -0.041686360 Rsq_quad
2 58838 0.9933775 0.9988962 -0.005518764 Rsq_quad
3 58842 0.9441755 0.9562784 -0.012102874 Rsq_quad
4 58844 0.9948570 0.9996546 -0.004797544 Rsq_quad
5 58855 0.9592769 0.9644571 -0.005180252 Rsq_quad
6 58857 0.9952800 0.9954673 -0.000187301 Rsq_quad

Lets move to the conditional growth model — random-intercept-random-slope. Also, we have now included treatment. Btw, the (time|animal_number) can be best read as (1+time | animal_number) but does not need the additional annotation. Hence, (time|animal_number) is the code necessary to request a random-intercept-random-slope model.

To the left, if you squeeze your eyes, you can see a random slope somewhat, but it is surely not as clear as the random intercept. To the right the calibration plot. Predictions become more off as the weight increases which makes sense. The larger the mean, the more variance you will find. Especially in growth models.

Code below shows many additional models, changing in both fixed and random effects. We then compared them all to select a winner.

> data.frame(Model=myaicc[,1],round(myaicc[, 2:7],4))
Model K AICc Delta_AICc AICcWt Cum.Wt eratio
5 QT 11 845.2731 0.0000 0.9815 0.9815 1.000000e+00
7 QxT 19 853.2190 7.9459 0.0185 1.0000 5.314210e+01
4 LT 10 904.5280 59.2549 0.0000 1.0000 7.362806e+12
6 LxT 14 905.3122 60.0391 0.0000 1.0000 1.089747e+13
3 Q 5 932.5382 87.2652 0.0000 1.0000 8.900052e+18
2 L 4 960.9260 115.6530 0.0000 1.0000 1.299341e+25
1 I 3 1445.8183 600.5453 0.0000 1.0000 2.551215e+130
Clear to see the variance attributed by the random intercept and random slope at the animal-level. Their correlation is not very high. Below are the fixed effects and their correlation. Treatment does not seem to have any additional explanatory power.
Code to extract random and fixed effects from the best model, as well as the variance-covariance matrix. Last piece of code looks a bit deeper in the q.t.r model.

Below you can see that we asked the model to specify animal-specific intercept and slope effects. Then follow the fixed effects. This means, that the formula to predict the BW for animal 58833 at time-point 21 having treatment 2 goes as follows:

Intercept → 7.644-0.8421= 6.802 +

Time → 0.160+0.021 = 0.181*21=10.603 +

Time² → 0.0056 =0.0056*21*21= 13.072 +

Treatment →-0.0279 =13.045

> head(ranef(model.q.t.r))
$animal_number
(Intercept) time
58833 -0.84215052 0.020826408
58838 0.14472475 0.116202118
58842 0.72890016 -0.032279547
58844 -0.27168740 -0.040408184
58855 1.99977032 0.042297651
58857 0.36337948 -0.030983612
58859 0.57012609 0.046758630
58861 -1.16268046 -0.025436646
58877 -0.62441343 -0.189737239
58879 0.63046959 0.036521097
58894 -1.88534651 -0.102504893
58895 1.30106656 -0.024664555
58905 -0.60900735 0.022391597
58919 -1.30348657 -0.031411944
58938 1.22407165 0.071694004
58939 -0.86975057 -0.111685843
> head(fixef(model.q.t.r))
(Intercept) time I(time^2) treatm2 treatm3
7.644750106 0.160296158 0.005642406 -0.027922237 -0.079978283
treatm4
0.081867461
Code to check via ANOVA and permuted Likelihood Ratio Testing if adding treatment makes the model better.

The output below hints that a more complex model does not add relevant information. This would mean that the entire variation can be explained via time and two random components.

> anova(model.q.t.r, model.q.r) # treatment does not provide a better model
Data: data_long
Models:
model.q.r: BW ~ time + I(time^2) + (time | animal_number)
model.q.t.r: BW ~ time + I(time^2) + treatm + (time | animal_number)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
model.q.r 7 836.49 862.34 -411.24 822.49
model.q.t.r 11 844.35 884.98 -411.17 822.35 0.1414 4 0.9976
PBmodcomp(model.q.t.r, model.q.r, nsim=500, cl=cl)
Bootstrap test; time: 17.55 sec; samples: 500; extremes: 497;
large : BW ~ time + I(time^2) + treatm + (time | animal_number)
BW ~ time + I(time^2) + (time | animal_number)
stat df p.value
LRT 0.1414 4 0.9976
PBtest 0.1414 0.9940

Below you can see the results from code to assess whether the Mixed Model meets the assumption of Independent and identically distributed random variables. Clear to see is that adding random intercepts makes sense, but that this is not the case for the random slopes. Also, the effect of treatment is almost absent. Last but not least, a nice table to impress friends and colleagues showing off the hard work you put in your Mixed Model.

--

--

Dr. Marc Jacobs

Scientist. Builder of models, and enthousiast of statistics, research, epidemiology, probability, and simulations for 10+ years.