Introduction to Mixed Models
A mixed model (or more precisely mixed error-component model) is a statistical model containing both fixed effects and random effects. It is an extension of simple linear models. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. It is the regression models which is one of the powerful tool for linear regression models when your data contains global and group-level trends.
They are particularly useful in settings where repeated measurements are made on the same statistical units (longitudinal study), or where measurements are made on clusters of related statistical units.
In the field of ecological and biological data are often complex and messy and sometimes bi-modal. We may have different grouping factors like populations, species, sites, gender ,etc. Sample sizes might leave something to be desired too, especially if we are trying to fit complicated models with many parameters.
This is why mixed models were developed, to deal with such messy data and to allow us to use all our data, even when we have low sample sizes, structured data and many co-variate to fit.
Following is the representation of mixed model:
Y = Fixed Effect + Random Effect + Error
What are fixed effects?
A fixed effects model is a statistical model in which the model parameters are fixed or non-random quantities. It is assumed that the observations are independent.
Eg:- Gender is a fixed effect variable; the values of male/ female are independent of one another (mutually exclusive) they don’t change over a time period.
What are random effects?
A random effects model is a statistical model where the model parameters are random variables. It is assumed that some type of relationship exists between some observations.
Eg:- The cost of a new car varies depending on what year it was purchased.
Advantage of Mixed Model
- It allows random effects with fixed effects.
- Do better job in handling missing data.
- Allows measurements to be made repeatedly over time.
- Can work on other types of dependent variable:- categorical, continuous, ordinal, discrete count, etc.
- Works for correlated data regression models, including repeated measures, longitudinal, time series, clustered & other related methods.
This article walks through an example using politeness data relating to introduce this concept. I will be using R as a small condolence to the language, though a robust framework in this post.
You can download the data by hand…
http://www.bodowinter.com/tutorial/politeness_data.csv
Performing Mixed Model in R
There are two packages in R to perform mixed models:-
- lme4
- nmle
Data Analysis
I will be using lme4 package in this post to perform mixed model. Let’s start and load the data. Apparently, there are missing values in the data. I am dropping the missing values. After removing the missing values the data looks like this.
You have lmer() function available; which is the mixed model equivalent of the function lm(). This function is going to construct mixed models.
The difference in politeness level is represented in the column called “attitude”. In that column, “pol” stands for polite and “inf” for informal. Sex is represented as “F” and “M” in the column “gender”. The dependent measure is “frequency”, which is the voice pitch measured in Hertz (Hz). The interesting random effects for us are in the column “subject” and “scenario”.
Let’s look at the relationship between politeness and pitch by means of a boxplot:
In both this cases, the median line which is a black line in the center of boxplot; is lower for the polite than for the informal condition. However, there
may be a bit more overlap between the two politeness categories for males than for females.
Let’s start with building our first mixed model,
So, here frequency is our dependent variable, attitude is our fixed effect, subject and scenario are our random effects. What does (1 | subject) signifies?
This tells us that random intercept with fixed mean.
Whatever is on the right side of |
operator is a factor and referred to as a “grouping factor” for the term. Random effects (factors) can be crossed or nested — it depends on the relationship between the variables.
There are various different styles of writing random effect to the equation. You can add random intercept with a priori means also you can add slopes with intercept etc. Following are the syntax of writing random effect in lmer() package.
Now, let’s check the summary of this model.
In mixed model you get both fixed effect and random effect in the summary of model as shown above. Let’s have a look at the standard deviation which is obtained in the summary of our model as shown below.
This is a measure of how much variability in the dependent measure there is due to scenarios and subjects (our two random effects). You can see that scenario has much less variability than subject. Based on our boxplots from above, where we saw more differences between subjects than between items, this was expected. Followed by “Residual” which stands for the variability that’s not due to either scenario or subject. This is our “ε” , the “random” deviations from the predicted values that are not due to subjects and scenario.
Let’s check the fixed effects in summary of model. Here “attitudepol” is the slope for the categorical effect of politeness. Now -19.695 means that pitch is lower in polite speech than in informal speech, by about 20 Hz. Also, there’s a standard error associated with this slope, and a t-value, which is simply the estimate (20 Hz) divided by the standard error of fixed effect.
As always, it’s good practice to have a look at the plots to check our assumptions:
Also, check QQPlot:
If you want to have a look at table for lme4(), I’d recommend that you should have a look at the stargazer() package. It has nice annotation and there are lots of resources.
Let’s calculate the RMSE value of our model.
Conclusion
Mixed effects models can be a bit tricky and often there isn’t much consensus on the best way to tackle something within them. The coding bit is actually the (relatively) easy part here.