Confounding and Effect modification … … What is the difference between them?

YS Koh
5 min readAug 12, 2020

You might have heard of such terms (confounding and effect modification) in your statistics class, but you still could not decipher what your lecturer was saying (perhaps he or she became too excited about the concept and become gibberish) and how to differentiate them? In this article, I will use simple examples to explain the meaning of these terms and how do we deal with them to examine the true association between a predictor of interest and outcome (using R as the statistical software if applicable).

Confounders

Confounders are factors that distort the association between a predictor (independent variable) and outcome (dependent variable). For example, a study may find that drinking coffee have a higher chance of getting lung cancer. Hence, does this mean that we should not drink coffee so that we reduce our chance of getting lung cancer?

If you take a closer look at what is being done in the study, perhaps you may have realized coffee drinkers tend to be smokers as well. It is well known and established that smoking increases the chance of having lung cancer (Do read Doll’s paper, which establishes that smoking is a predictor of lung cancer). In this case, smoking is a confounder that causes coffee drinking to be associated with lung cancer. If we remove this confounder, coffee drinking may no longer associated with lung cancer (and we can drink coffee in peace again). These relationship is explained in the diagram below:

Figure 1: A diagram to explain how smoking (Confounder) causes the association of between coffee drinking and lung cancer to be spurious.

So, what can we do to “remove” confounder?

Statistically, we can adjust for these confounders by including them in a multiple regression model for adjustment. For instance, we performed a logistic regression model (Do read my Medium article on how to interpret logistic regression) to determine the association of coffee drinking and lung cancer.

Model 1: Coffee drinking -> Lung Cancer

Perhaps it was found that the odd ratio is 1.01 (with no coffee drinking as the reference point) and the p-value is 0.04. This means that as compared to those who do not drink coffee, the odds of getting lung cancer for those who drink coffee is 1% higher and the association is significant.

After which, we created another logistic regression, adjusted for smoking.

Model 2: Coffee drinking + Smoking -> Lung Cancer

After adjustment, perhaps the odd ratio for coffee drinking is 0.98 (with no coffee drinking as the reference point) and the p-value is 0.1. This means that as compared to those who do not drink coffee, the odds of getting lung cancer for those who drink coffee is 2% lower and the association is no longer significant.

There is also another method called stratification (smoking vs no smokers, which I will explain in the later section for effect modification) that can account for confounding. However, I would prefer this method for adjustment of co-founders as it is more direct and intuitive.

Effect modification

Effect modification means that the association between a predictor and outcome may be different depending on a third variable. For example, using a similar example above, the association between coffee drinking and lung cancer may be present in males, but not in females. In this case, gender becomes an effect modifier and such relationship should be reported instead of being adjusted in a regression model.

Figure 2: Explaining effect modification. It could be that the association between coffee drinking and lung cancer is found in males but not females

So, what can we do to determine effect modification?

I would recommend two ways to do so. One way is to add an interaction term to the regression model.

Model 3: coffee drinking + gender +coffee drinking * gender -> lung cancer

In R, it can be implemented with the following code:

model <- glm(lung_cancer ~ gender + coffee_drinking + gender coffee_drinking, family = binomial, data = data_lung)

The results is shown in the screenshot:

Figure 3: Result of the logistic regression with interaction term gender * coffee drinking
Figure 4: Odd ratio and 95% CI for the logistic regression

From here, we can conclude that the interaction term was significant (p-value < 0.0228) and gender is an effect modifier (The association between coffee drinking and lung cancer is different for males and females).

To obtain the respective odds ratio for male and female:

Female: Since it is the reference group, the odd ratio for coffee drinkers vs non-coffee drinkers would be 4.25.

Male: We would need to perform the following calculation to determine the odd ratio for coffee drinkers vs non-coffee drinkers: exp(1.5115 +1.4469–1.9169)/exp(1.5115) = 0.625

Another way is to stratify the analysis. This means that the logistic regression can be separated into males and females.

For males:

Figure 5: Logistic regression for male
Figure 6: Odd Ratio and 95% CI for male

For females:

Figure 7: Logistic regression for female
Figure 8: Odd Ratio and 95% CI for female

By stratification, it also shows that gender is a effect modifier, with male having a odd ratio of 0.625 and female having a odd ratio of 4.25. While the association between coffee drinkers and lung cancer is significant for females, the association is not significant for males. The result is similar as the above method of including interaction term.

In short, confounders distort the association between the predictor and outcome, while effect modifiers differentiate the association between the predictor and outcome. One should adjust for confounders, but report the different effects seen for effect modifers.

(The above results are hypothetical for the purpose of explaining the concepts. They do not represent the actual research results in the literature.)

--

--

YS Koh

I am interested in using R programming for the field of epidemiology and biostatistics.