Comparing intraclass correlation to Pearson correlation coefficient (with R)
Intraclass correlation (ICC) and Pearson correlation coefficient (Pearson’s r) are both methods for determining degree of relationship between different groups in a dataset. Pearson’s r and each of the ICC models have different use cases and can give very different results in distinct situations. If an incorrect relationship model is used for the situation, it can lead into misleading results.
In this article we compare the results of the three most common ICC models to Pearson’s r with three easily understandable datasets.
Dataset 1
Judge 1 Judge 2
[1,] 2 4
[2,] 4 6
[3,] 6 8
In dataset 1 we have three value pairs that represents three ratings by two judges. The values between the two judges ratings
have a linear ann additive relationship with each other. The additive relationship comes from the fact that by adding 2 to each of the Judge1 ratings, they equal the Judge2 ratings.
Next we compare the results for different ICC models and Pearson’s r for the first dataset.
># Pearson's r
>cor(data1[,1], data1[,2])
1># ICC(C, 1)
>icc(data1, model = "twoway", type="consistency")$value
1># ICC(C, 2)
>icc(data1, model = "twoway", type="agreement")$value
0.6666667># ICC(k)
>icc(data1, model = "oneway")$value
0.6
Pearson’s r measures linear correlation between variables. The dataset has a perfect linear relationship between the ratings of the two Judges. For this Pearson’s r gives a correlation result of 1, which indicates a perfect positive correlation.
ICC(C, 1) gives also a score of 1. This is because the model accept additive relationship between Judges as perfect relationships.
ICC(k) and ICC(C, 2) on the other hand give values between 0 and 1, which indicates some level of positive correlation with the judges, but not a perfect one. This is because both of the models require an absolute agreement between the judges to get to a perfect score. This means that the ratings between judges would need to be equal for perfect score.
Dataset 2
Judge1 Judge2
[1,] 0 4
[2,] 5 5
[3,] 10 6
In dataset 2 the values between the two judges’ ratings have a linear relationship with each other, but this time they do not have an additive relationship.
Here we compare what different ICC models and Pearson’s r give for the seccond dataset
># Pearson's r
>cor(data2[,1], data1[,2])
1># ICC(C, 1)
>icc(data2, model = "twoway", type="consistency")$value
0.3846154># ICC(C, 2)
>icc(data2, model = "twoway", type="agreement")$value
0.483871># ICC(k)
>icc(data2, model = "oneway")$value
0.5428571
As the two variables have again a linear relationship the Pearson’s r gives a perfect positive correlation of 1.
This time the dataset does not have an additive relationship, and because of it ICC(C, 1) does not give a perfect positive relationship. In fact it actually gives the worst correlation score of all the methods.
ICC(k) and IC(C, 2) give similar results between 0 and 1 as with the first dataset.
Dataset 3
Judge 1 Judge 2
[1,] 4 4
[2,] 5 5
[3,] 6 6
In dataset 3 the values between the two judges’ ratings are equal.
># Pearson's r
>cor(data2[,1], data1[,2])
1># ICC(C, 1)
>icc(data2, model = "twoway", type="consistency")$value
1># ICC(C, 2)
>icc(data2, model = "twoway", type="agreement")$value
1># ICC(k)
>icc(data2, model = "oneway")$value
1
The equal ratings between the judges leads into a perfect positive relationship with all of the models.
Thank you for reading!
References:
- Shrout, P. and Fleiss, J., 1979. Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), pp.420–428.
- McGraw, K. and Wong, S., 1996. Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), pp.30–46.