Introduction to Mixed Models in R
Making variance work for you.
The following post is a ‘simple’ introduction to Mixed Models in R using a dataset of the BW development of piglets. Each piglet was measured at four time-points — 0, 7, 14, and 21.
Mixed Models are extensions of traditional regression models by being able to model the explained and unexplained random parts of a model through the use of variance components. As such, they are splendid tools to deal with correlated and unbalanced data. A traditional outlet for Mixed Models are datasets that contain repeated measurements which is often referred to as longitudinal data if time is the key reason for repetition.
That is because there is variance within (correlated observations) and between observations. The level at which observations take place is key in these models. In this example, the level of interest is the piglet, which contains multiple observations. Such datasets are labeled as ‘nested’.
In this example, I will use the above described longitudinal dataset to build several models:
- Unconditional means model — includes only an intercept (the overall mean).
- Unconditional growth model or random-intercept model. The model allows piglet-level specific intercepts, but only a global single slope coefficient.
- Conditional growth or random intercept random-slope model. Each piglet is modeled as both a separate entity and as a member of the larger dataset. Hence, for each piglet a random intercept and random slope is modeled, but estimates on the global intercept and the global slope.
The ability of a Mixed Model to determine level-specific — here animal — coefficients by also looking at the global part of the model leads to shrinkage. This means that predictions coming from a Mixed Model are pulled toward the greater mean, which safeguards against overfitting.
Let’s get started.
The following code is a bit of a side-step, introducing longitudinal K-means clustering to asses if the data can be partitioned in different clusters. The below coding does this exercise, looking for 2 to 5 clusters. It is essentially an exploratory tool, but the results do hint that the data is not unique enough to belong to five treatments. Perhaps not even to four. In addition, it is clear that the clusters differ in intercepts and not so much in slopes. Hence, we have a hint at a random intercept model.
Of to our first model, the unconditional means model. It is actually quite simple. It has a single intercept, and a random effect (1|animal). The code (1|animal) tells the model that for each animal separately we want to estimate its intercept. Hence, a random intercept model.
Although we will include the random intercept, we will refrain from any other predictor except the grand mean in order to estimate the intracluster-correlation coefficient (ICC). The ICC is a simple ratio of explained variance / total variance. In effect, the ICC of an unconditional means model is a screening tool to see if a Mixed Model makes sense.
The output above and below show a 6.4% variance in the total dataset that can be attributed to the variation between animals. This is the ICC. That is not a lot, and to be honest does not really foreshadow a whole lot of succes when applying a Mixed Model. But lets try anyhow.
> icc(model.i)
# Intraclass Correlation CoefficientAdjusted ICC: 0.064
Conditional ICC: 0.064
The following model is an unconditional growth model. There is an intercept, a fixed effect of time, and a random intercept.
Next up is an unconditional polynomial growth model. The same model as the last one, except now we include a quadratic effect of time.
The results favor the quadratic model, but not by overwhelming force. Also, if you look at the plots above, I would not be so tempted to use a quadratic model. Especially, since treatment was not even included yet.
Model K AICc Delta_AICc AICcWt Cum.Wt eratio
3 Q 5 932.5382 0.0000 1 1 1.000000e+00
2 L 4 960.9260 28.3878 0 1 1.459925e+06
1 I 3 1445.8183 513.2801 0 1 2.866517e+111
The code below will look, at the animal level, which model is preferred. Overwhelmingly, it is the quadratic model, which can be explained by the absence of more important variables.
> head(results)
Animal Rsq_lin Rsq_quad Diff Bestfit
1 58833 0.9401456 0.9818319 -0.041686360 Rsq_quad
2 58838 0.9933775 0.9988962 -0.005518764 Rsq_quad
3 58842 0.9441755 0.9562784 -0.012102874 Rsq_quad
4 58844 0.9948570 0.9996546 -0.004797544 Rsq_quad
5 58855 0.9592769 0.9644571 -0.005180252 Rsq_quad
6 58857 0.9952800 0.9954673 -0.000187301 Rsq_quad
Lets move to the conditional growth model — random-intercept-random-slope. Also, we have now included treatment. Btw, the (time|animal_number) can be best read as (1+time | animal_number) but does not need the additional annotation. Hence, (time|animal_number) is the code necessary to request a random-intercept-random-slope model.
Code below shows many additional models, changing in both fixed and random effects. We then compared them all to select a winner.
> data.frame(Model=myaicc[,1],round(myaicc[, 2:7],4))
Model K AICc Delta_AICc AICcWt Cum.Wt eratio
5 QT 11 845.2731 0.0000 0.9815 0.9815 1.000000e+00
7 QxT 19 853.2190 7.9459 0.0185 1.0000 5.314210e+01
4 LT 10 904.5280 59.2549 0.0000 1.0000 7.362806e+12
6 LxT 14 905.3122 60.0391 0.0000 1.0000 1.089747e+13
3 Q 5 932.5382 87.2652 0.0000 1.0000 8.900052e+18
2 L 4 960.9260 115.6530 0.0000 1.0000 1.299341e+25
1 I 3 1445.8183 600.5453 0.0000 1.0000 2.551215e+130
Below you can see that we asked the model to specify animal-specific intercept and slope effects. Then follow the fixed effects. This means, that the formula to predict the BW for animal 58833 at time-point 21 having treatment 2 goes as follows:
Intercept → 7.644-0.8421= 6.802 +
Time → 0.160+0.021 = 0.181*21=10.603 +
Time² → 0.0056 =0.0056*21*21= 13.072 +
Treatment →-0.0279 =13.045
> head(ranef(model.q.t.r))
$animal_number
(Intercept) time
58833 -0.84215052 0.020826408
58838 0.14472475 0.116202118
58842 0.72890016 -0.032279547
58844 -0.27168740 -0.040408184
58855 1.99977032 0.042297651
58857 0.36337948 -0.030983612
58859 0.57012609 0.046758630
58861 -1.16268046 -0.025436646
58877 -0.62441343 -0.189737239
58879 0.63046959 0.036521097
58894 -1.88534651 -0.102504893
58895 1.30106656 -0.024664555
58905 -0.60900735 0.022391597
58919 -1.30348657 -0.031411944
58938 1.22407165 0.071694004
58939 -0.86975057 -0.111685843> head(fixef(model.q.t.r))
(Intercept) time I(time^2) treatm2 treatm3
7.644750106 0.160296158 0.005642406 -0.027922237 -0.079978283 treatm4
0.081867461
The output below hints that a more complex model does not add relevant information. This would mean that the entire variation can be explained via time and two random components.
> anova(model.q.t.r, model.q.r) # treatment does not provide a better model
Data: data_long
Models:
model.q.r: BW ~ time + I(time^2) + (time | animal_number)
model.q.t.r: BW ~ time + I(time^2) + treatm + (time | animal_number)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
model.q.r 7 836.49 862.34 -411.24 822.49
model.q.t.r 11 844.35 884.98 -411.17 822.35 0.1414 4 0.9976PBmodcomp(model.q.t.r, model.q.r, nsim=500, cl=cl)
Bootstrap test; time: 17.55 sec; samples: 500; extremes: 497;
large : BW ~ time + I(time^2) + treatm + (time | animal_number)
BW ~ time + I(time^2) + (time | animal_number)
stat df p.value
LRT 0.1414 4 0.9976
PBtest 0.1414 0.9940
Below you can see the results from code to assess whether the Mixed Model meets the assumption of Independent and identically distributed random variables. Clear to see is that adding random intercepts makes sense, but that this is not the case for the random slopes. Also, the effect of treatment is almost absent. Last but not least, a nice table to impress friends and colleagues showing off the hard work you put in your Mixed Model.