Identifying Causal Effects with Experiments
“No causation without manipulation.” –Paul Holland
At GitHub, we frequently use experimental research designs to study the effects of new features and models. This post describes when and how we use experiments, and why they are such a powerful tool.
tl;dr: experimental designs are the most efficient means of establishing causal relationships, but causal inference is hard!
To discuss randomized controlled experiments, in this post, we’ll use the example of the Golden Ticket experiment. In the Golden Ticket we enrolled approximately 20,000 users into an experiment to study the effects that distributing free private repositories amongst people had on their GitHub activity.
We hope you’ll walk away learning more about the following things that you can apply to your own research programs:
- When to use experiments.
- How to assign subjects to treatment versus control groups.
- Why causal inference is hard, and how random assignment to treatment solves the problem.
What is an experiment?
An experiment is a specialized research design where researchers study a relationship by directly manipulating subjects’ values on the independent variable and measure subsequent changes in the values on the dependent variable.
In more formal language we say that the researcher controls the assignment to treatment, where the treatment is a particular level of the independent variable. (Experimentation in the social sciences was largely modeled on clinical drug trials; hence, all kinds of IVs are commonly referred to as “treatments.”)
- Dependent variable (DV) — the key outcome of interest.
- Independent variables (IVs)––the other factors thought to affect the dependent variable.
Variables can be any characteristic that is measurable, either directly or indirectly: features of the physical world, human behaviors, beliefs, socio-political traits, etc. The goal of a controlled experiment is to collect data that either supports or refutes the hypothesized relationship.
In the Golden Ticket experiment, our dependent variable was activity on GitHub, and the independent variable was access to private repositories.
Controlled experiments always contain at least two groups:
- The treatment group(s) receive some version of the treatment. At GitHub, the treatment is usually a feature change, such as free repositories, modified organization permissions, or revised navigation UI.
- The control group does not receive the treatment; they continue to receive whatever their typical GitHub experience would be in the absence of the experiment. The only difference is that we track their activity. This group serves as a baseline against which we can measure differences in behavior among the treatment group.
In the case of the Golden Ticket there were three treatment groups that received coupons via email for either 1, 3, or 5 free private repositories for life. The control group did not receive any coupons.
We assign subjects to treatment or control randomly. We do this in order to ensure that there is no systematic relationship between whether a user receives the treatment and any other characteristic that might affect their outcomes. Randomization is critical for drawing conclusions about treatment effects.
Why use an experiment?
We use experiments when we want to identify the causal effect of a treatment. Our goal is to learn what outcomes are attributable to something that we did, and what we can expect if we continue or expand it (e.g. “genpop” an experimental feature).
In the Golden Ticket, we wanted to know what the effect of getting private repos would be on our users’ behavior. Specifically we wanted to know:
Does having private repositories “cause” people to become more active users of GitHub?
Let’s look at this more concretely with regards to a specific user.
If Ms. Monalisa Octocat, currently owner of an inactive GitHub account with a great username (@octocat), were given private repositories, would she be more likely to become active on GitHub?
This is a straightforward question, but the answer is not. Causality is surprisingly hard to identify from non-experimental data. Without a solid experiment design in place that has a protected control group, we can’t separate the effect of the treatment from underlying differences between the groups assigned to treatment and control. A randomized, controlled experiment gives us a way to separate these things, so we can identify the effect of the treatment alone.
Randomization Eliminates Selection Bias
Selection bias is caused by a correlation between potential outcomes and treatment status. In our example, the outcome variable, activity on GitHub, is likely to be correlated with the choice to purchase private repositories. However, how strong the correlation is, and what direction it moves in (do more active GitHub users want private repositories more or less than those who push code less frequently?) are unknown, so quantifying this bias is difficult.
We can solve this problem by forcing the correlation between treatment status and the outcome variable to be zero. The easiest way to do this is to assign the treatment status randomly. If treatment assignment and outcomes are uncorrelated, the selection bias term is equal to 0 and the ATT is an *unbiased* estimate of the average treatment effect (ATE).
This has the added benefit of making the math required for analysis very simple. If the randomization is successful, the treatment and control groups will be equal on all measurable characteristics except the treatment. This allows us to estimate the ATE very simply by just taking the difference in the average outcome level in the treatment and control groups.
Causal Inference is Hard
We can think of the causal effect of the treatment as the difference between what happens when the treatment is applied and the counterfactual– what would have happened if the treatment was not applied.
Randomized, controlled experiments allow us to estimate average treatment effects simply and efficiently in ways that are not possible with observational data.
To recap with our three learning points:
1. When we use experiments:
We use experiments when we want to be able to cleanly identify the causal effect of some treatment.
2. How we assign subjects to treatment versus control groups:
We assign subjects to treatment and control groups randomly, in order to break any correlation between outcomes and treatment status that would occur if subjects self-selected into treatment groups
3. Why causal inference is hard, and how random assignment to treatment solves the problem:
The world, even GitHub’s small part of it, is a messy place, and determining whether something we changed is the cause of a change in something else is very difficult. Random assignment creates treatment and control groups that are evenly balanced on all of the things we can’t control, so that we can attribute any differences between the two groups to the one thing that differs between them: whether they received the treatment or not.
This article was written as a partnership. Research is better when you get to discover new things about the world with someone who has a different perspective — someone who is willing to challenge you. I’m grateful to the people I worked with at GitHub. We were a small-but-mighty team.