Propensity Score Matching for Causal Inference: Creating Data Visualizations to Assess Covariate Balance in R

10 min readMay 20, 2024

by Sharon H. Green, D-Lab Data Science Fellow

Propensity score matching is used to estimate causal effects using observational data. It is a statistical technique that aims to emulate a randomized experiment. In an experiment, study participants may be randomly assigned to one of two groups: the “treated,” who receive a drug, procedure, or some other intervention, and the “controls,” who receive no intervention (or a different intervention). If treated and control individuals are similar to each other, except for the treatment, differences in their outcomes can be attributed to the treatment.

However, researchers often cannot assign individuals to treated and control groups. For example, scientists may want to know how lead exposure affects children’s health, but it would be highly unethical to randomly assign children to receive lead exposure. We may be able to observe children, some of whom have higher lead exposure than others, but the children with higher exposures may be different from the children with lower exposures. For example, children from poorer families are more likely to live in older houses and be exposed to lead-based paint while children from wealthier families are more likely to live in newer houses without lead-based paint and are more likely to be able to afford lead abatement. Thus, socioeconomic status may be a confounding variable, because it may be related to both children’s lead exposure and health, complicating inference. In these cases, we can try to emulate a randomized experiment with propensity score matching, which attempts to balance the treated and control groups with respect to measured non-treatment differences.

This blog post builds on another post, Introduction to Propensity Score Matching, written by D-Lab Data Science Fellow Alex Ramiller. Here, I provide a tutorial for implementing propensity score matching and creating data visualizations to assess covariate balance–that is, visually assessing whether the matched individuals really are balanced with respect to measured covariates. When implementing propensity score matching, you match individuals so that the “treatment” and “control” groups are balanced on their covariates. In other words, you want there to be approximately equal distributions of covariates in both groups. This is what will allow us to attribute differences in outcomes to a treatment.

After implementing propensity score matching, it can be difficult to assess if the matching yielded adequate balance between the two groups. Data visualizations make it easier to assess the quality of the matches before estimating causal effects. These visualizations are powerful tools that can be used to determine whether there are approximately equal distributions of the covariates in the “treatment” and “control” groups and to see whether our results pass the statistical tests that determine balance (e.g., absolute standardized mean differences, KS statistics). The R code can be found in this GitHub repository.

In this blog post, we will address the same research question that Alex Ramiller examined in his blog post: Did participation in the National Supported Work (NSW) demonstration program result in higher incomes? NSW is a program that was implemented in the 1970s to help individuals enter the labor market. In this case, individuals who participated in the NSW formed the “treatment” group, and individuals who did not formed the control group.

Load libraries and explore the lalonde dataset

First, we load two R packages: MatchIt and cobalt. MatchIt makes it easy to select matched samples on covariates and propensity scores (as well as other matching procedures) [1]. Cobalt integrates with MatchIt and is used to create figures [2].

We use the lalonde dataset to address this research question [3]. The variable “treat” (below) reflects whether individuals participated in the NSW program or not (1 = yes/treated, 0 = no/control). The dataset also contains age (years), education (years), race (Black, Hispanic, or White), marital status (married or other), and whether they earned a high school degree (1 = yes, 0 = no) as covariates. The last three variables reflect real earnings in 1974, 1975, and 1978, respectively.

# Load the necessary libraries and inspect the data
library(MatchIt)
library(cobalt)

head(lalonde)

Data wrangling and preprocessing

After loading packages and examining the data, we recode the responses so that the figures will be easier to understand later.

# Recode values
lalonde$race <- recode(lalonde$race, 
                       black = "Black",
                       hispan = "Hispanic",
                       white = "White")
table(lalonde$race)

lalonde$married <- recode(lalonde$married, 
                          "1" = "Married",
                          "0" = "Other")
table(lalonde$married)

var_names <- data.frame(old = c("age", "educ", "race_White", "race_Black",
                                "race_Hispanic", "married_Other", "nodegree",
                                "re74", "re75"),
                        new = c("Age (Years)", "Education (Years)", "Race (White)",
                                "Race (Black)","Race (Hispanic)", "Not Married", 
                                "No Degree","Earnings (1974)", "Earnings (1975)"))

Implement matching

We use the matchit() command to create a pre-matching object and a propensity score matching object. We create a pre-matching object to view the initial imbalance in the data. Comparing the outputs from the pre- and post-matching commands should reveal better covariate balance in the post-matching object.

As seen in the code below, matchit() takes four arguments: 1) a formula that estimates the propensity score by relating the treatment to the covariates, 2) the dataset to be used, 3) the method specifying the type of matching, and 4) the distance, which refers to the measure used to determine how well participants match.

The matchit() command can implement several different kinds of matching. The code below specifies method = “full”; this implements optimal full matching. Other options include:

method = “nearest” for nearest neighbor matching,
method = “optimal” for optimal pair matching, and
method = “exact” for exact matching.

It is also easy to implement ratio matching. Although the most common form of matching is 1:1, meaning pairing one control and one treated unit, it is also possible to pair several controls to each treated unit. To specify the number of controls to be paired with each treated unit, specify ratio = k, where k is the number of control units.

Additionally, the command can also specify matching with or without replacement. Matching with replacement (specified as replace = TRUE) means that the same control can be paired with multiple treatment units while matching without replacement (which is the default setting) means that each control can only be paired with one treatment unit.

# Create a pre-matching object
no_match <- matchit(treat ~ age + educ + race + married +
                 nodegree + re74 + re75, data = lalonde,
               method = NULL, distance = "glm")
no_match

# Create a propensity score matching object 
ps_match <- matchit(treat ~ age + educ + race + married +
                 nodegree + re74 + re75, data = lalonde,
               method = "full", distance = "glm", link = "probit")

ps_match

The output from calling the first matchit object shows no matching was performed, prints the number of observations (614), and lists the covariates. By contrast, the second output shows that we implemented optimal full matching, that 614 treated observations were matched with 614 control observations, and lists the same covariates. (In some cases, not all units match and the matched sample is smaller than the original sample.)

You can also call the summary() command on each matchit object to view the summary of balance. The output of this command displays means (for treated and controls), standardized mean difference, variance ratio, eCDF means, and eCDF max for each covariate. While this information is useful, it is easier to understand by viewing visualizations.

Visualizing covariate balance

Next, we create two types of data visualizations to assess covariate balance: covariate distribution plots and balance (Love) plots.

Covariate distribution plots

Since the goal of matching is to achieve approximately equal distributions of covariates in the two groups — the “treated” and the “control” — we create plots to visualize covariate distributions. If the matching was successful, we should see similar covariate distributions in the treated and control groups. Furthermore, the distributions of the treatment and control groups should be more similar to each other in the matched sample than in the unmatched sample.

The code below creates two graphs for each covariate, reflecting the distributional balance in the unmatched and matched samples. We use the Berkeley colors (Berkeley Blue and California Gold) to differentiate the control and treatment groups, respectively. The Hex colors (#003262 and #FDB515) are specified under scale_fill_discrete().

# Create covariate distribution plots

# Age plot
plot.age = bal.plot(ps_match, "age", which = "both",
                    sample.names = c("Unmatched", "Matched")) +
  scale_fill_discrete(labels = c("Control","Treatment"),
                      type = c("#003262","#FDB515")) +
  labs(title = "Age", x = "Age (Years)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12))

# Education plot
plot.educ = bal.plot(ps_match, "educ", which = "both",
                     sample.names = c("Unmatched", "Matched")) +
  scale_fill_discrete(labels = c("Control","Treatment"),
                      type = c("#003262","#FDB515")) +
  labs(title = "Education", x = "Education (Years)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12))

# Race plot
plot.race = bal.plot(ps_match, "race", which = "both",
                     sample.names = c("Unmatched", "Matched")) +
  scale_fill_discrete(labels = c("Control","Treatment"),
                      type = c("#003262","#FDB515")) +
  labs(title = "Race/Ethnicity", x = "Race/Ethnicity") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12))

# Marital status plot
plot.maritalstatus = bal.plot(ps_match, "married", which = "both",
                              sample.names = c("Unmatched", "Matched")) +
  scale_fill_discrete(labels = c("Control","Treatment"),
                      type = c("#003262","#FDB515")) +
  labs(title = "Marital Status", x = "Marital Status") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12))

# Assemble distribution plots into one graphic
distributiongraphs <- (plot.age / plot.educ ) | (plot.race / plot.maritalstatus)

distributiongraphs <- distributiongraphs +
  plot_annotation(title = "Distribution Balance in Unmatched and Matched Samples",
                  theme = theme(plot.title = element_text(size = 16, hjust = 0.5),
                                plot.title.position = "plot"))

distributiongraphs

The visualizations demonstrate that the match was successful because the treated and control groups in the matched samples appear to be similar (and are more similar to each other than in the unmatched samples). For example, in the race/ethnicity graphs of the unmatched sample, there is a large proportion of people who identify as Black in the treatment group (yellow) and a small proportion of people who identify as Black in the control group (blue). By contrast, in the matched sample, there are approximately equal proportions of people who identify as Black.

Balance (Love) plots

The balance (Love) plots below display the absolute standardized mean differences (left) and the Kolmogorov-Smirnov test (KS) statistics (right) for each covariate in the unmatched and matched samples. The absolute standardized mean differences and KS statistics are often used to assess balance. The former reflects the difference in the means of each covariate between the two groups and is standardized so that it is on the same scale for all of the covariates. The latter is often recommended as a supplement to the absolute mean differences to assess balance. Lower values suggest the distributions are more tightly matched.

The dotted lines on the graphs (0.1 for the absolute standardized mean differences and 0.05 for KS statistics) are thresholds in the literature that are often used to indicate good balance. As you can see by comparing the unmatched (yellow) and matched (blue) values, the matched sample has a better covariate balance than the unmatched sample. The graphs below show that all of the covariates had absolute standardized mean differences that were below the threshold and that most of the covariates had KS statistics that were below the threshold (although three did not pass the KS threshold test).

After matching and assessing balance, the sample can be used to estimate causal effects.

# Create balance (Love) plots to assess quality of matches
loveplot.md <- love.plot(ps_match,
                   var.names = var_names,
                drop.distance = TRUE,
                var.order = "unadjusted",
                abs = TRUE,  
                line = TRUE,
                stars = "raw",
                shapes = c("circle filled", "circle"),
                thresholds = (m = .1),
                stats = "mean.diffs",
                sample.names = c("Unmatched", "Matched"),
                title = NULL,
                colors = c("#FDB515", "#003262")
)

loveplot.ks <- love.plot(ps_match,
                    var.names = var_names,
                drop.distance = TRUE,
                var.order = "unadjusted",
                abs = TRUE,  
                line = TRUE,
                stars = "raw",
                shapes = c("circle filled", "circle"),
                thresholds = (ks = .05),
                stats = "ks.statistics",
                sample.names = c("Unmatched", "Matched"),
                title = NULL,
                colors = c("#FDB515", "#003262")
)  

# Assemble balance plots into one graphic
loveplot.md = loveplot.md + theme(legend.position = "bottom")

loveplot.ks = loveplot.ks + theme(legend.position = "none")

loveplots = loveplot.md + loveplot.ks +
  plot_annotation(title = "Covariate Balance in Matched and Unmatched Sample",
                  theme = theme(plot.title = element_text(size = 16, hjust = 0.5),
                             plot.title.position = "plot"))
loveplots

Conclusion

In this tutorial, we implemented propensity score matching to determine whether the NSW program resulted in higher incomes. Then, we made several different data visualizations to assess the quality of the matching. These graphs show that our analysis yielded good matches — as indicated by the distributions in the matched samples (compared to the unmatched samples) and by the covariate absolute standardized mean differences and KS statistics. These tools can be applied to better understand many causal relationships.

For more information on matching methods, see Elizabeth Stuart’s review article [4]. Noah Greifer published several useful tutorials for using MatchIt. These include tutorials for getting started [5], assessing balance [6], estimating effects after matching [7], matching methods [8], and matching with sampling weights [9].