Propensity score matching in Stata

7 min readJul 3, 2022

Estimating average treatment effects using propensity score matching

What is propensity score matching?

At its most basic, propensity score matching is a technique that balances the distribution of baseline covariates between groups so that the independent effect of an intervention can be measured.

This can be important when comparing the effectiveness of two treatments outside of a randomised controlled trial, such as when analysing routine administrative health data.

Here, the probability of a patient receiving one treatment rather than another will not be random, but instead determined by careful clinical consideration and patient choice. These selection effects lead to an imbalance in patient characteristics between groups, and thereby the risk of poor health outcomes (confounding by indication).

To mitigate this confounding and allow for unbiased comparisons of an intervention treatment versus a control treatment, propensity score matching can be used to pair patients from each group who shared a similar probability of assignment to the intervention, conditional on the vector of observed covariates — their so-called propensity score.

Any subsequent difference in health outcomes observed between the two treatment groups are thus assumed to be a result of the treatment.

But how do we go about doing this in Stata, and what process should we follow? Let’s work through an example.

What are the basic steps?

First, as an overview, below are the key steps to follow when matching patients by their propensity scores:

Collect and prepare the data.
Estimate the propensity scores.
Match patients using the estimated scores.
Evaluate the baseline covariates for balance across groups.

Collect and prepare the data

Run the following command in Stata to load in a sample dataset that will use as the basis of our worked example:

use http://ssc.wisc.edu/sscc/pubs/files/psm.dta

You will see that the file contains four variables:

A binary indicator of treatment and non-treatment (t),
Two continuous covariates (x1,x2); and,
And a continuous outcome (y).

Can’t we just report a crude estimate of the treatment effect?

If patients had been assigned to treatment or non-treatment at random, we could run a simple t-test to compare the mean value of y in each group, as follows:

ttest y, by(t)

Crude difference in patient outcomes by treatment status

This indicates that our outcome was 2.31 units higher in our treated group than observed among our untreated patients.

However, as treatment assignment in non-interventional settings is typically non-random, such an approach may may be biased by confounding factors.

We can quickly test this by regressing our two covariates (x1, x2) on our treatment variable (t), as shown below. Doing so we see that the probability of treatment is positively correlated with our covariates. A similar picture is found if a linear regression is used to regress our covariates on the outcome variable.

Association between treatment assignment and our two baseline covariates

It is clear that the relationship between our treatment and outcome is being confounded by x1 and x2, each having the effect in this case of increasing the observed treatment effect.

Let’s do some propensity score matching instead

To get around this confounding, we can carry out the same estimation as with ttest by using teffects. The basic syntax of the teffects command when used for propensity score matching is:

teffects psmatch (outcome) (treatment covariates)

This command is helpful as it undertakes the propensity score matching (psmatch) and calculation of our treatment effects (teffects) all in one go.

Importantly, psmatch matches the nearest neighbour by default, as illustrated in the figure below. This is equal to matching each patient in the treatment group with a patient from the untreated group who exhibits the closest matching propensity score. Furthermore, propensity scores are estimated using a logit function. In both cases, these defaults can be altered.

What results do we get?

In our test dataset the basic command would be written as follows, yielding the output also shown below:

teffects psmatch (y) (t x1 x2)

The average treatment effect with propensity score matching using a default logit link function

This output now reports the average treatment effect after accounting for imbalance in the distribution of baseline characteristics between the treated and untreated groups. Following this adjustment, we get a treatment effect that is attenuated as expected versus the crude t-test shown earlier, at 1.02 and 2.31 respectively.

How were these treatment effects calculated?

Without getting too stuck into the underlying mathematics, let’s take a quick look at how this new estimate was obtained by focusing on one matched pair of patients.

If we run our commend again but include gen (match)option, we get a new column that reports the patient who which each individual was matched. Follow this up with some prediction options and we can also observe how the treatment effect was calculated:

teffects psmatch (y) (t x1 x2)predict ps0 ps1, ps // The propensity scores for each group.predict y0 y1, po // The potential outcome estimated for each group.predict te // The treatment effect estimated for each group.

Propensity score matching and outcome estimations for patients 1 and 467

Patient 1 in our dataset was categorised as untreated and matched to patient 467, so the results of these two patients are shown above for the sake of our worked example.

The first thing you will notice is that their propensity scores (ps0, ps1) are very similar to one another, which is as expected. Our second set of predictions gives us the matched outcome value for each patient. As patient 1 it is a control observation, y0 is the observed outcome value (-1.79) and the expected outcome if patient 1 were treated (y1) becomes the observed y value of observation 467 (2.23). The treatment effect for patient 1 is then calculated as y1-y0. In turn, the average treatment effect becomes nothing more than the sum of these individual treatment effects, which can also be calculated manually following prediction as:

summarise te

Time to evaluate the baseline covariates for balance across groups

Now that we’ve matched our treated patients, baseline covariates should now be similarly distributed between the treated and untreated groups.

The best way of assessing this is by looking at the standardised mean differences between group means across all baseline covariates. This is recommended as a measure of balance as, being independent of the unit of measurement, it allows comparison between variables that have different units of measurement.

Although easy to calculate manually, we can output all of these using a single line of code:

tebalance summarize

This reports the standardised mean differences before and after our propensity score matching. Ideally, following matching, standardized differences should be close to zero and variance ratios close to one. The output above appears to indicate that x2 may not be well balanced by our model as presently specified, suggesting that a richer model should be tested, such as by adding new putative confounders and exploring interaction terms.

If inclined, you can graphically present the density function of our problematic variable before and after matching as follows:

tebalance density x2

Would matching to more than one patient help?

In the our worked example, each patient was matched to their nearest neighbour, which is the default behaviour for teffects psmatch.

However, we can request that the command instead match each patient to multiple individuals by specifying the nneighbor(#)option:

teffects psmatch (y) (t x1 x2), nneighbor(3)

The result we obtain from this procedure is as follows:

The conditional average treatment effect when matching on three nearest neighbours

We can see that, when matching on the three nearest neighbours, our average treatment effect rises from 1.02 to 1.06 and our standard error falls from 0.12 to 0.09. Rather than a result of improved balance, this shift occurs because matching on more distant neighbours uses more data per treated patient, reducing the variance of our estimator but at a cost of an increase in bias by matching on patients whose propensity scores are less closely aligned:

We can constrain this risk of bias by setting a calliper value. This has the effect of only permitting a match if the absolute difference in propensity scores falls below a specified threshold:

teffects psmatch (y) (t x1 x2), nneighbor(2) caliper(0.03)

Can I play around with the link function?

Finally, if you’d prefer that propensity scores be estimated using a probit rather than a default logit function, we can apply this change as an option when specifying our treatment and covariates. Make your own decision as to which prediction mechanism best fits your use case.

teffects psmatch (y) (t x1 x2, probit)