Association vs. agreement for method comparison, which one should be considered in clinical study?
The comparison is a very common objective in clinical studies to assess equivalency, i.e., define whether the A and B methods are statistically comparable. Is there an obvious statistical method to do so? Unfortunately, the short answer is no. The longer one is: although there is no consensus in the literature, I will try to demonstrate that we have all the information required to handle such an objective.
Here at Ad Scientiam, we develop, validate and implement digital biomarkers that continuously measure the evolution of the disease in real life. Accordingly, these digital biomarkers have to be compared with existing tools used in routine and therefore, the statistical comparison of two measurement methods, (eg., assays, procedures, measurement systems, etc.), is considered.
This article will present the most common approach that is used in the literature so far, even though it isn’t appropriate in the vast majority of the said articles. Then, I will lead you through a more appropriate approach for comparison studies and explain why as biostatisticians, we should spread the word.
The ugly duckling technique: correlation
In a first instance, the correlation coefficient seems to be the best statistical method to assess equivalency between two methods and is actually the most common approach.
As a reminder, the Pearson or product-moment correlation shows whether and how strongly pairs of variables are related. The correlation coefficient is computed as the ratio of covariance to the product of their standard deviations. When the correlation coefficient value is closer to +1 or -1, it indicates there is a stronger linear relationship between the two variables.
The null hypothesis of linear regression (mostly performed with correlation) is based on a non linear relationship. Showing such a result is obvious, even with a minimal trend. Thus, sometimes analyses can incorrectly conclude that two measurement methods are related.
If you feel that handling dummy data would help you understand the correlation, I invite you to try this interactive tool: https://rpsychologist.com/correlation/.
What about agreement?
However, in our comparison, we want to demonstrate the way in which two methods of taking measurements agree. The agreement refers to the degree of concordance between two methods of measurements. Statistical methods to test agreement are used to decide whether one technique for measuring a variable can substitute another.
In correlation, the two methods designed to measure the same variable should have strong correlation when a set of samples are chosen in such a manner that measurements to be determined vary widely. In the case of method comparison, this means that samples should cover a wide value range. A high correlation designed to measure the same measurements could just be a sign that one method has chosen a widespread sample. Correlation quantifies the degree to which two variables are related but a high correlation does not automatically imply that there is good agreement between the two methods.
Pearson correlation coefficients have to be interpreted cautiously: the measures of association between two methods is not a measure of agreement.
Right, but is there another solution now?!
Don’t panic! It has been generally accepted that the Bland and Altman approach is the most appropriate technique for studying measurements methods comparison. This approach is simple and can be used along with the correlation approach to avoid misinterpretation of the correlation coefficient.
The most appropriate technique: Bland Altman approach
Bland and Altman’s (B&A) approach is a difference plot which examines the differences between the two methods and studies the agreement in terms of Limit of Agreement (LoA). The B&A plot represents each measurement difference between two paired methods against the average of the measurement. The aim is to evaluate the agreement between the two measurements, hence, the behaviors of the differences, between one measurement and the other, are statistically studied. The hypothesis is that the measurements obtained by one method or another, gave similar results, with an acceptable range of course ie., the so-called limit of agreement. So, all the differences would be equal to zero. To note, if one of the two methods is a reference (gold standard), the difference plot can represent every difference between two paired methods against the reference method measurement.
This approach allows:
- to evaluate the agreement between the two measurements
- to statistically study the behaviors of the differences between one measurement and the other (bias: do the differences differ systematically from zero?)
- to assess errors (how much do the differences vary?)
The LoA include both systematic (bias) and random error (precision), and provide a useful measure for comparing the likely differences between individual results measured by two methods. Limits of agreement can be derived using a parametric method given the assumption of normality of the differences (mean difference +/- 1.96 SD) or using non-parametric percentiles when such assumptions do not hold. A clinical LoA can also be determined a priori if relevant.
So, let’s take a closer look!
Comparison between correlation and the Bland-Altman approach
The same data is used for the correlation and the Bland Altman plots?
In figure 1, the regression line has a slope of 1.06 (1.02 to 1.09) and an intercept of 7.08 (-0.30 to 19.84). The correlation coefficient between the two methods is r = 0.996 (0.991–0.998), P < 0.001 and shows a strong correlation.
However, as shown in figure 2, a bias of -27.2 units is found. The bias is represented by the gap between the X axis, corresponding to a zero difference, and the parallel line to the X axis at -27.2 units.
This example shows that correlation can quantify the degree of relation between two variables, but a high correlation does not automatically imply that there is good agreement between the two methods.
Let’s conclude…
Correlation between two methods:
- is misleading and does not measure agreement
- should not be used to assess the method comparability
The evaluation between two measurements depends on the goal of the study:
- to study the significance of differences: focus on the differences
- to study the equivalence or the comparability: focus on the agreement
To note, the tendency to imitate what we observed in other published papers and the fact that many papers used the same incorrect association methods encourage sometimes to use systematically low quality statistical methodology by replication. Statisticians should be aware of this problem and use their influence to convince their uninitiated colleagues. It is important at Ad Scientiam to apply relevant statistical methods to be sure that the digital biomarkers are valid and can bring concrete added value compared to the existing tools. This is innovation without compromise!
And now let’s practice with some R code!
Here, an example of a high correlation of 0.91 with a poor agreement (bias of 24 units taking 0 units as reference of a good agreement), is shown with a R script.
#Do not forget to install the required R packages before loading them if necessary#Install packages
install.packages(ggplot2) ## graphs
install.packages(blandr) ## Bland Altman
install.packages(ggpubr) ## arrange plots#Load packages
library(ggplot2)
library(blandr)
library(ggpubr) #Data
data = data.frame(value_2 = c(44, 45, 41, 53, 44, 44, 50, 45, 60, 57, 47, 52, 54, 62),
value_1 = c(22, 22.5, 20.5, 26.5, 20, 23, 25, 22.5, 30, 28, 23, 26, 32, 30))#Correlation plot
my_cor_Plot = ggplot(data, aes(x=value_2, y=value_1)) +
geom_point() +
theme_bw()+
geom_smooth(method=lm, se=FALSE) +
ggtitle(“Pearson Correlation”)#Correlation test
my_cor_Test = cor.test(x = data$value_2, y = data$value_1, method = “pearson”)#Bland-Altman plot
my_BA_Plot = blandr.draw( data$value_2, data$value_1 ) +
geom_point( )#Bland-Altman statistics (bias, LoAs limits, CIs, means, differences and other statistics estimations)
my_BA_Stat = blandr.statistics(data$value_2, data$value_1)
my_BA_Bias = my_BA_Stat$bias#Arrange plot
my_plot = ggarrange(my_cor_Plot, my_BA_Plot, nrow = 1, ncol = 2)#Print
print(my_plot)
Thank you for reading my article!
References
Carstensen, B. (2011). Comparing clinical measurement methods: a practical guide. John Wiley & Sons.