# Demystifying the inverse probability weighting method

## A fairly simple and intuitive method for identifying the causal effects

Inverse Probability Weighting (IPW) is a popular quasi-experimental statistical method for estimating causal effects under the assumption of conditional independence. This method can be easily implemented and easy to understand. In this post, I will provide an explanation of this method with minimal R codes.

# What is wrong with the Ordinary Least Squares (OLS)?

Consider a case that we are interested in the effect of a binary treatment (labelled as W) on a continuous outcome (labelled as Y). Using data, we could estimate the Average Treatment Effect (ATE), which is the average effect of W on Y over all observation units (both treated and untreated).

OLS has been widely used for estimating ATE. So, what is wrong with it? The problem with OLS is that its results can be very sensitive to the specification of the functional form. In particular, the results could be very wrong (biased estimates), if the model specification is wrong. The essence of the sensitivity of OLS to the model specification is that OLS actually estimates a variance-weighted average treatment effect . The variance is the variance of observation attributes (measured by covariates), measuring the differences in the distribution of covariates between treated and untreated units. In other words, the OLS places more weights on those that are dissimilar in their attributes. This can be problematic because the identification would thus have to rely on extrapolation (inducing extrapolation bias).

# The merit of inverse probability weighting

The IPW is a method that helps avoid extrapolation. Simply speaking, unlike OLS, IPW places more weights on observations that are similar to each other in the covariates, improving on the covariate balance. This is the reverse of what the OLS does. In practice, the IPW can be implemented in two steps:

At step 1, one estimates a logit mode to estimate the probability (labelled as P) of being treated.

At step 2, one uses the Weighted Least Squares (WLS) to estimate the effect of W on Y. The weight is the inverse of the estimated probability. Specifically, the weight is 1/P for treated units and 1/(1-P) for untreated units.

If there are two treated units: A and B. And the estimated probabilities of being treated for A and B are 0.5 and 0.8, respectively. The weights of A and B are thus 2 and 1.25, respectively. We can see that A would be given more weights than B in IPW, while in OLS, they receive equal weights. Why should B be given a smaller weight? This is because, relative to others, B happens to be so “keen” to be treated, and there is probably a reason behind this. Whatever the reason is, B is more different to the untreated units than other treated units (e.g., A). Thus, we should place less trust on B when analyzing the data, shouldn’t we? By the same logic, we should place less trust on the untreated units that are very resistant to be treated. So, the untreated units with higher estimated probability of being treated receives higher weights. At last, our model is estimated using data of those that are more similar (thus more comparable) to each other. “Extracting” data on similar observation units mimic the natural experiments.

# Code

Now let’s move to an empirical example with some R codes to show how IPW actually works. In this example, I am using the dataset of LaLonde, who was interested in the effect of participating job training programs (treat, a binary variable) on real earnings (re78, a continuous variable). Details about the data can be found in the reference .

First, we run an OLS regression. Note that we have included several controls (age, educ, black, hisp, married, nodegr, re74, re75) in the model. Upon running the codes, one can find the coefficient of the treat variable is 1676.343. This suggests that the real earning would increase by 1676.343 when one participates into a job training program (treat).

`library(Matching) #library that saves the LaLonde datasetlibrary(dplyr) # library for processing datadata(lalonde) # Call the dataset.# Run OLS regressionolsreg <- lm(re78 ~ treat + age + educ + black + hisp + married + nodegr + re74 + re75, data = lalonde)summary(olsreg) # Output omitted`

Second, we perform the estimation using IPW. We run a Logit model to estimate the probability of participating the program using the codes below:

`# Logit modelpsreg <- glm(treat ~ age + educ + black + hisp + married + nodegr + re74 + re75, data = lalonde, family = binomial(link = 'logit'))`

Next we get the estimated the probabilities and construct weights from them. Following this, we estimate a WLS regression, and the coefficient of the treat variable is 1637.154. This is slightly lower than the OLS estimate. The minor difference between OLS and IPW estimates is due to that the program participation was randomly assigned. So the participants and non-participants are in fact not very different to each other. If they are, we would likely see a larger difference in the results.

`lalonde <- lalonde %>%  mutate(prob = predict(psreg, type = 'response')) %>%  mutate(invwt = treat/prob + (1-treat)/(1-prob)) # Make weights# Weighted least squares estimation ipwreg <- lm(re78 ~ treat + age + educ + black + hisp + married + nodegr + re74 + re75, data = lalonde, weights = invwt)summary(ipwreg) # Output omitted`

# Ending words

In this post, I provide a brief explanation of the Inverse Probability Weighting (IPW) method for estimating the average treatment effect. Some R codes are included to show how this method can be easily implemented. I also argue that OLS might suffer from extrapolation bias, but the IPW method serves as an alternative to correct for it.

In some cases, the IPW method, however, could also yield biased results. Sometimes, the balance in covariate might even get worse after weighting (so one might want to check the covariate balance after weighting). This might be due to that the specification of the model for estimating the probability of being treated is wrong. The statistical literature has come up with several remedies to this issue, and a popular one is the doubly robust estimator. We will cover this in the next post.

# References:

 Abadie, Alberto, and Matias D. Cattaneo. 2018. “Econometric Methods for Program Evaluation.” Annual Review of Economics 10 (1): 465–503.

 LaLonde, Robert. 1986. “Evaluating the Econometric Evaluations of Training Programs.’’ American Economic Review 76:604–620.

## The Startup

Get smarter at building your thing. Join The Startup’s +793K followers.

### By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

Medium sent you an email at to complete your subscription.

Written by

## Bowen Chen, PhD

Applied Economist & Data Scientist

## The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +793K followers.

Written by

## Bowen Chen, PhD

Applied Economist & Data Scientist ## The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +793K followers.

## More From Medium

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium