Causal Inference — A Brief Introduction

Andy Mandrell
Analytics Vidhya
Published in
8 min readJun 12, 2020
Image via pixabay

Motivation

Suppose we are given data from a local research laboratory about the success rates of two treatments on patients who exercise and don’t exercise.

From this data, we can make the following observations:

  • Within both subgroups {Exercise, No Exercise}, Treatment 1 has a higher success rate.
  • Treatment 2 has a higher success rate across the overall (total) population of patients.

We might ask why Treatment 2 is more effective across the overall population, but less effective for both subgroups. It is difficult to determine from this data which treatment is actually better. In order to determine this, we need to think more deeply about how the data was generated, what potential confounders exist, and the experimental context. We need to understand the causal story behind the data.

One potential causal story could be that Treatment 1 is cost effective and less invasive (i.e, dietary supplement) for patients that exercise, so it is used more for this subgroup. Thus, medical practitioners prefer to avoid Treatment 2, which is very successful but much more costly and invasive (i.e, surgery), in all but the most difficult cases. Although this causal story explains the data well, the data itself does not tell us about the causal effect of the treatment. More information is needed to create a causal story.

In the story above, we passively observed a doctor giving a treatment — we don’t know if researchers instructed doctors to randomly administer the treatments. If we observe that a patient got Treatment 1, we can infer that the patient exercises. In this case, because we did not make any intervention or take action (e.g. randomizing who receives what treatment), it is more difficult to talk about causality than it is to talk about correlation. For example, if we instruct doctors to randomize who gets what treatment (i.e, perform an intervention), we would then have adequate information to perform causal inference.

What is casual inference?

Causal inference is a conceptual and technical framework for understanding the effects of hypothetical actions or interventions. Example causal questions include:

  • What causes stress?
  • Does exercising prevent obesity?
  • Do algorithms increase fairness in the criminal justice system?
  • If I didn’t drive today, would I have not gotten in a speeding ticket?

In general, in an experiment, we perform an action and make an active assignment. The effects of an action are in general not given by conditional probability — conditioning on something (passive observation) is not the same as performing an action in the real world. We call the effect of an action its causal effect — if we passively observe something, it’s not sufficient to talk about causality.

Importance of interventions

Interventions enable us to differentiate among the different causal structures (see section below) that are compatible with an observation. If we manipulate (perform an action) an event A and event B does not change, then A cannot be the cause of event B. Likewise, if a manipulation of an event A causes a change in B, then A must be the cause of event B, although there might be other causes of A as well (i.e, confounders). For example, a ‘do-intervention’ holds a variable constant in order to determine a causal relationship between that variable and other variables. Using the motivating example above, say that we are looking at the probability:

P(Y=y | X=x) where Y = {Success, Failure}, X = {Treatment 1, Treatment 2}

If we perform do-intervention, and fix X to be Treatment 1, we could observe the probability of:

P(Y=y | do(X=Treatment 1))

In general, these two probabilities are not equal — this means that X is not the true cause of Y and other variables influence the outcome of Y. If these probabilities are equal, we can infer that X has a causal effect on Y. For more information on performing interventions, read here.

Structural Causal Models

We represent causal stories formally using structural causal models. These causal models represent assumptions that we make about the world; they are not given to us, nor is there any way to get them out of the data (having domain knowledge is beneficial here). In the motivating example above, we could construct our causal model by talking to medical practitioners about their decision making process when assigning treatments to patients. Here is one such simplified, causal model of our previous example:

In this causal model, the variable ‘Exercises?’ confounds both the variables ‘Treatment’ and ‘Success.’ If a doctor knows that a patient exercises, they are likely to select a Treatment that has the highest success rate. Similarly, whether or not a patient exercises will also influence their success. We will next dive into how to build structural causal models, and what variables we should control for when performing causal inference.

We first construct a causal model by including all variables that may share a dependency relationship in the experiment. One caveat here is that even if we define all observed variables in the study, there may be unobserved variables which we may not be aware of or able to measure — this is common, and our best bet is to rigorously design a model that is compatible with our assumptions. Once we have a causal model — which is a representation of a causal story — we can use it to decide which variables are mediators, which variables are confounders, which variables are colliders, and thus which variables we should control for. There are three important variables we should understand when constructing a causal model:

  1. Confounder variables

In this model, Z is a confounding variable. This model is called a fork, where Z is a common cause of both X and Y. Confounding leads to a disagreement between conditional probabilities (observation) and do-interventions (actions), as previously noted:

P(Y=y | X=x) ≠ P(Y=y | do(X=x))

2. Mediating variables

In this model, Z is called a mediator variable. Z contributes to the total causal effect of X on Y.

3. Collider variables

In this model, Z is called a collider variable. Z deconfounds both X and Y — in this case, we can replace ‘do-interventions’ with conditional probabilities.

So, which variables should we control for in an experiment?

We should try to control for confounding variables, but not for mediating variables or collider variables. Controlling for confounding variables allows us to understand the true causal effect of X on Y. Controlling for mediator variables reduces the true effect of X on Y. Controlling for a collider variable can create anti-correlation between X and Y (even when they are truly uncorrelated in the population) — this phenomenon is known as Berkson’s law or collider bias.

Estimating causal effects

The next step is to determine what the causal effect of an action or intervention is — performed on the data. In this example, we will work with a do-intervention. Recall that a do-intervention fixes a variable to a constant value in order to determine a causal effect of that variable on other variables — the probability is:

P(Y=y | do(X=x))

It is difficult to proceed with calculating this probability. Luckily, the Adjustment Formula allows us to rewrite this as a conditional probability:

P(Y=y | do(X=x)) = Σ (P(Y=y | X=x, Z=z)P(Z=z)), where sigma is over all z

This gives us one way of estimating the causal effect of a do-intervention in terms of conditional probabilities — we estimate the causal effect of X on Y separately in every partition of the population defined by a condition Z = z for every possible value of z.

It is important to note that we can control for a set of variables with this formula, not just one variable (i.e, Z could comprise of the set of confounders {A, B, C}).

How do we determine what set of variables to control for? Recall that we do not want to control for mediator or collider variables. It might seem easy to determine this set from a simple model such as the above example, but real-life causal models may be incredibly complex (a combinatorial explosion of nodes in the graph). Fortunately, the Backdoor Criterion allows us to determine a minimal set of nodes that we can control for such that we eliminate confounding bias. The Adjustment Formula is usually reliable in the absence of unobserved confounding — there may be variables that confound X and Y that we cannot measure or know of. Even with this limitation, with careful assumptions and a plausible causal model design, the Adjustment Formula can still provide powerful insight.

Similarly to the Adjustment Formula, which is adequate when our variable(s) Z is comprised of discrete values, the Inverse Propensity Score Weighting (IPSW) can be used when Z is continuous or too large to sum over.

Spoiler alert:

One way we can avoid the headaches of designing causal models and dealing with unobserved variables is by performing a randomized controlled trial (RCT). RCTs have two major advantages over performing interventions — they eliminate confounder bias and enable researchers to quantify their uncertainty. RCTs are preferred to observational studies in which we perform interventions to determine causality. Unfortunately, there are many cases in which RCTs may be too expensive, impossible, or harmful.

Significance

Causal inference is not a solution, nor does it make it easier to answer the right questions and perform the correct actions to determine causality. But, it can be used as a guide in the design of new studies. It can help us choose which variables to include, which to exclude, and which to control for. We saw that structural causal models can serve as a mechanism to incorporate scientific domain knowledge and exchange plausible assumptions for plausible conclusions. Most importantly, it is necessary for scientists to deeply understand causal inference techniques and concepts in order to avoid inappropriate conclusions or recommendations that could be problematic to society.

In the next article, we will dive into the application and use of causal inference on real-world data. In particular, we will see how to apply Inverse Propensity Score Weighting to estimate the causal effect of a treatment.

--

--

Andy Mandrell
Analytics Vidhya

Data Engineer at Capital One, Data Science @ UC Berkeley