Performing Multivariate Mixed Modeling

Published in

Analytics Vidhya

6 min readNov 26, 2019

Before going further you should have a prior knowledge of Linear Mixed Models(LMM) which you can find in my previous blog.

Any way let’s have a quick introduction to mixed models. Linear mixed effects models is increasingly common in the analysis of biological data. It offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures. It includes a combination of fixed and random effects as predictor variables.

These models are powerful. Software advances have made these tools accessible to the non-expert and have become relatively straightforward to fit in widely available statistical packages such as R. Here I am going to focus on the implementation of LMMs using R.

To perform multivariate LMMs in R there is not any specific package yet. So, let’s see how it’s done. I have created a sample data for this study.

Note :- The purpose of this blog is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do. In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics and potential follow-up analyses.

Let’s load the data and have a look at summary and structure of data

Here I have rate as an my dependent variable, other variables are independent. We will be working with the variables ln and exp predicted from rate all nested within id. You can see that my data is in wide format.

I will be using reshape2 package to reshape my data in log format. The result is a new data set that is called data1.

We have melted the data

Now, let’s see how our data looks after melting our data.

Now, our next step plays an important role in performing multivariate analysis. To perform multivariate LMM we need to create dummy variables for our categorical variable; that is variable in our data.

Creating dummy variables for “ln” and “exp”

In this code D1 and D2 are the dummy variables for ln and exp.

For, ln the value of D1 is 1 else 0. Similarly for exp the value of D2 is 1 else 0. D1 and D2 are 0/1 dummy variables coding whether the outcome is the variable ln or exp, id is now repeated many times and rate is repeated twice, once for each outcome variable.

To fit the model on our data, I will use the nlme package to build LMM. In my previous blog I used lme4 package to build LMM. There is a new package appropriate for many types of random coefficient models, lme4 however it does not handle special residual co-variance structures.

Note:- The syntax for both packages in writing formula are little different.

Let’s fit the model using nlme package.

Note:- We specify special control arguments because the model did not converge in the default number of iterations, so they were increased.

Let’s look at some diagnostic plots. First, we will look at the residuals against the fitted values. We will look at them overall, and broken down by outcome variable using the lattice package.

We will plot QQ-plot separated by variable for better understanding.

I will add simple line to this QQ-plot.

From the QQ-plots of the residuals, it is pretty clear that they are not normally distributed and we might be concerned about that assumption. For the sake of demonstration, however, we will continue. Next, let’s look at the residuals plotted against the fitted values to check for things like heteroscedasticity.

overall plot

Let’s see the spread of both variable.

for both variable

After separating by variable makes it look a bit better, though still far from ideal. We can see that the spread of predicted values is very different for these two variables. This is surprising that they are not on the same scales. On that scale, it is difficult to see the real spread for ln. We might examine the plot just for this variable that is ln.

It does not seem too badly spread. Our plots all look very good we need not to do any changes. Let’s check the summary of our model.

It contains lot of information. If we look at our model’s summary then we will find that it has both the fixed effect and random effects. The residual standard deviation which is square root of the residual variance is for the ln variable. To get the value for the exp variable, multiple the residual by 6.74 .

These type of model can become complex and sometimes run into convergence problems. It is easy to misspent the random effects and co-variance structure.

The nlme package has a function called getVarCov() which gives a variance co-variance matrix. The variance co-variance looks something like this:

A variance-covariance matrix is a square matrix that contains the variances and covariances associated with several variables. The diagonal elements of the matrix contain the variances of the variables and the off-diagonal elements contain the covariances between all possible pairs of variables. The variance measures how much the data are scattered about the mean.

We can get estimated variance, standard deviation and correlations between the random-effects terms in a linear mixed-effect model.

This may run into convergence problems and can become slow. It is not a very common technique so, you may not find much literature.

Conclusion:- These models are quite complex. It allows us to look at relationships between variables in an overarching way and to quantify the relationship between variables. In this blog I have performed bi-variate mixed models similarly we can perform multi-variate mixed models. Now we are able to uses multiple variables to forecast possible outcomes.

In mixed models you have lot of things to explore. To know more amazing things of mixed models stay connected to my series of mixed models blog.

Performing Multivariate Mixed Modeling

Published in Analytics Vidhya

Written by SushrutVyawahare

Responses (2)