The Great Regression — with Python: Difference-in-Differences Regressions

A simple applied approach to experiments

7 min readJun 30, 2024

Motivation

In the last article, I discussed how to implement a simple model to judge the (linear) relationship between two or more variables. In this article, I will walk you through a simple yet powerful tool for judging the difference between groups.

In an ideal world, we would like to conduct a random experiment in which individuals are assigned to groups by change. Such a design is typically called a “randomized control trial” (RCT), which has two major advantages:

RCTs eliminate the selection bias that occurs when some individuals self-assign to either the control or treatment group. When selection bias is present, the estimated group difference is blurred and cannot be interpreted unambiguously as the true difference, i.e. the true treatment effect. Since group allocation is random in an RCT, this problem is eliminated.
The other challenge that RCTs address is omitted variable bias. OVB means that an important factor is not taken into account. Omitting important variables that are related to our variables of interest means that the effect cannot be accurately estimated and interpretation is weakened.

Although RCTs are the gold standard, they have a major drawback. They are not always feasible in the real world. Imagine that you are interested in the effects of the minimum wage. Ideally, you would like to compare two individuals with the same skills, background, education, age, etc., but one receives the minimum wage and the other does not. However, we cannot simply give one person the minimum wage and not the other, because that would discriminate against one group.

With backgrounds in economics, researchers in this field address these challenges of RCTs by resorting to quasi-experimental designs using a difference-in-differences (DiD) framework. This tool is also often used for (causal) impact evaluations.

In a nutshell, difference-in-differences allows us to calculate the difference in groups before and after an event (or shock). An important assumption for this to work is that the event must be exogenous. That is, the event cannot be anticipated by all individuals. Otherwise, the treatment and control groups react to the event before it even happens, and the difference between the groups can no longer be attributed to the event, causing the experiment to fail.

Another important assumption is parallel trends, i.e. both groups must have the same development before the shock. Otherwise, it would indicate that the groups are already different before the event and therefore the calculated DiD estimate cannot be attributed to the event and (causally) interpreted. The recent study by Rambach and Roth (2023) explains this in more detail and describes strategies for dealing with situations where this is not the case.

Calculation

The idea of DiD can be summarized in the following picture:

Calculate the difference between the two groups before the event (ΔA)
Calculate the difference between the two groups after the event (ΔB), assuming that the treatment group (because affected by the event) has now changed its path
Calculate the difference between the two differences, which reflects the DiD as the effect of the event on the treatment group.

The figure shows the graphical representation of the difference-in-differences (DiD) idea. — Graphical representation of DiD idea. Source: Own figure.

More formally, the idea can also be expressed as a (stylized) regression equation:

Regression equation with stylized DiD expression — DiD regression equation. Source: Own figure.

The DiD can be decomposed as when taking the derivatives with respect to time and groups and then canceling out (i.e., taking the difference) the equal parts, resulting in the DiD estimate (δ).

Decomposition of the DiD estimate into its single components — Decomposition of DiD estimate. Source: Own table.

Application

To show you the above intuition in action, I use data from Huntington-Klein’s book “The Effect,” which provides data on organ donor registration in the US. The setting is as follows: California changes its organ donor policy, moving from an opt-in approach to a choice strategy in 2011 (Q3 to be exact). This setting lends itself to a DiD framework where we want to evaluate the impact of the policy change on California’s donor rates relative to the other states that did not change their strategy.

DiD manually calculated

Let’s start by manually calculating the DiD without any regressions, which results in an unconditional difference-in-differences, since we are not taking into account any covariates that might affect the DiD estimate. However, this is always a good check on the regression results as it gives you an idea of what to expect.

As described above, we first calculate the average donor rates for the treatment group (California) and the control group (remaining states) before and after the policy change (Q3 2011).

# Calculate the average rate for all combinations of treatment time and region
mean_values_groups = organ_data.groupby(
    ["treated_time", "treated_california"]
)["Rate"].agg(np.mean).reset_index()

This results in the following output table, where treated_time equals one for periods after 2011 Q3 and zero before, and treated_california indicates the treatment of California (= 1) and the remaining states (= 0). We can see that the remaining states have, on average, higher donor rates than California, regardless of time.

Averages before and after policy change and for California and other states. Source: Own table.

Now we can calculate the difference for the control after and before the policy change (0.459–0.445 = 0.014) and the difference for the treatment group after and before the policy change (0.263–0.271 = -0.008). Finally, we take the difference of these differences (0.014 -(-0.008) = 0.022). Thus, the unconditional DiD is 0.022, or 2.2%, which means that the donor rate in California increased by 2.2% relative to all other states after the policy change.

#--------------------------------------------------
# Difference-in-Differences by "hand"
# Calculate the difference for control group across time
# NOTE: within group: its the difference of post-treatment vs pre-treatment time
def group_difference(group: int):
    diff = (mean_values_groups[
        (mean_values_groups["treated_california"] == group) &
        (mean_values_groups["treated_time"] == 1)
    ]["Rate"].values) - (mean_values_groups[
        (mean_values_groups["treated_california"] == group) &
        (mean_values_groups["treated_time"] == 0)
    ]["Rate"].values)
        
    return diff

# Calculate the difference for control group across time
diff_control_group = group_difference(group = 0)

# Calculate the difference for treatment group across time
diff_treatment_group = group_difference(group = 1)

# Calculate the difference-in-differences
# NOTE: This can be labeled as unconditional difference-in-difference since we
# are not controlling for confounding factors.

diff_overall = diff_treatment_group - diff_control_group
print(f"Unconditional difference-in-difference: {diff_overall[0]:.3f}")

Since we calculated the averages for each group and time period, we can generate a figure similar to the stylized figure above for the organ donor application. The dashed lines represent the period averages for each group. The difference between the dashed lines within each group represents the group-specific change, and the difference in these group-specific changes reflects the DiD we just calculated manually.

Trends for treatment and control group in the application setting over time. — Trends over time for treatment and control group. Source: Own figure.

DiD automatically calculated

Of course, Python also provides the statsmodels library to automatically compute the DiD estimates.

#--------------------------------------------------
# Difference-in-Differences by modelling

# add interaction term to date
organ_data["interaction_time_california"] = organ_data["treated_time"] * organ_data["treated_california"]

# define variables
y = organ_data[["Rate"]]
x = organ_data[["treated_california", "treated_time", "interaction_time_california"]]

# adding a constant to the linear model
x = sm.add_constant(x)

# fit the model
did_model = sm.OLS(y, x).fit(cov_type = "HC3")

# print results
did_model.summary()

Using the OLS function of statsmodels, we calculate an interaction effect, i.e., the DiD estimate, of 2.2%, which is identical to our unconditional estimate (since there are no additional control variables in the model for now).

Regression output using Python’s statsmodels. — Regression output using statsmodels. Source: Own table.

Note that the DiD is not statistically significantly different from zero as the standard error is quite large and the confidence interval crosses zero. However, we run a simplified model with only the individual terms for treatment and control group and the timing indicator, as well as the interaction term, with no other controls. If, for example, we add fixed effects for state and quarter as additional controls for unobserved heterogeneity, the effect becomes statistically significantly different from zero. I’ll show you how to do this in another post.

Summary

In this post, you learned about the commonly used difference-in-differences methodolgy, its general idea, its components, and how to compute it in Python (manually or automatically).

Thank you for reading!

The code files can be found here. For those interested, you can also find an R-script in the folder generating similar outputs.

Check out my GitHub for other projects. You can also read the story on my website. Feel free to reach out on LinkedIn.

Consider following for more interesting stories. Thanks!