Causal Inference with Python — Causal Graphs

8 min readFeb 24, 2023

--

Causal graph

A causal graph is a pictorial way to show the causal relationships among variables. It is a directed acyclic graph that represents how the world works. It helps us identify which variable(s) to control for confounding. In later posts, we will cover how to control for these variables.

Funding is the confounder that impacts both the treatment (tablet) and the outcome (score)

What can we learn from the graph above?

The funding of the schools causes the schools to give tablets to students. The greater the funding, the greater the chance of giving tablets to students ⇔ (inverse is true) The greater the chance of giving tablets to students, the greater the funding.
The funding of the schools causes a higher test score by providing students with a better environment. The greater the funding, the higher the test score ⇔ (inverse is true) The higher the test score, the greater the funding.
Giving tablets to students causes a higher test score. The greater the chance of giving tablets to students, the higher the test score ⇔ (inverse is true) The higher the test score, the greater the chance of giving tablets to students.

Causal graph using Python

# create a directed graph
G = nx.DiGraph()

# add edges from a list of tuples
G.add_edges_from([('funding', 'tablet'),\
                  ('funding', 'score'),\
                  ('tablet', 'score')])

# fix the positions of all three nodes
fixed_positions = {'funding':(1,2),\
                   'tablet':(0,0), \
                   'score':(2,0)}

# draw the network
nx.draw_networkx(G,\
                 with_labels = True,\
                 node_color = 'white',\
                 node_size = 1000,\
                 pos = fixed_positions)

Here is the causal graph generated by the above code.

Simple paths

Source: https://www.startribune.com/review-in-praise-of-paths-by-torbjorn-ekelund-translated-from-the-norwegian-by-becky-l-crook/570114552/

Chain: information flows from A to C through B.

Left: a chain; middle: a chain with B blocked; right: a chain with a concrete example

The left chain tells us that A and C are not independent because C is affected by A through B.

In the middle chain, by blocking B, that is to make B the same for all individuals, A and C are now independent.

A concrete example would be knowing statistics improves one’s problem solving skills which can help them become a data scientist. Notice that knowing statistics and becoming a data scientist are not independent. The reason being knowing statistics will increase one’s chance of becoming a data scientist and vice versa. Once we block problem solving, that is to make problem solving skills the same for everyone, knowing one’s statistics skills no longer provides additional information about whether an individual will become a data scientist or not. Statistics and data scientist are now independent with blocking.

Fork: information flows from C into both A and B.

Upper left: a fork; upper right: a fork with C blocked; bottom: a fork with a concrete example

The upper left fork tells us that A and B are not independent because C affects both A and B.

In the upper right fork, by blocking C, that is to make C the same for all individuals, A and B are now independent.

A concrete example would be knowing math improves one’s statistics and computer science skills. That is, better math → better statistics and better math → better computer science. Without knowing one’s statistics skills, we can infer their computer science skills with the help of their math skills. Once we block math, that is to make math skills the same for everyone, we can no longer infer one’s statistics skills from their computer science skills and vice versa. By blocking, statistics and computer science are now independent.

Inverted fork: information flows from A and B and collide at C (a collider).

Upper left: an inverted fork; upper right: an inverted fork with C blocked; bottom: a fork with a concrete example

The upper left inverted fork tells us that A and B are independent and each affects C.

In the upper right inverted fork, by blocking C, that is to make C the same for all individuals, A and B are no longer independent.

A concrete example would be both statistics and product sense help people become a data scientist. Statistics and product sense are independent as they require different ways of thinking and different training. Once we block data scientist, that is to make it known whether or not someone is a data scientist, we will have extra information to infer their statistics skills from their product sense and vice versa. To put it more concretely, we know person A is a data scientist. If we know person A is not good at statistics, we can infer that he/she is probably good at product sense. Similarly, if we know person A is not good at product sense, we can infer that he/she is probably great at statistics to be hired. By blocking, statistics and product sense are now dependent.

More complicated paths

Source: https://www.innertrends.com/blog/how-to-analyse-customer-paths

Front door path is like swimming with the current. We follow the direction of the arrows on the graph to go from the treatment to the outcome. Since the front door paths capture the treatment effect (exactly what we want), we would not control for any variables on the front door path.

Backdoor path is like swimming against the current. We move against the direction of the arrows on the graph to go from the treatment to the outcome. It’s like a sneaky way to get from the treatment to the outcome that we would like to block to ensure the treatment effect is not muddled.

Backdoor path criterion blocks all backdoor paths from the treatment to the outcome AND does not include any descendants of the treatment.

Source: https://www.pinterest.com/pin/144467100537677073/

Example #1

There is one back door path:

A ← B → C →Y

The back door path is not blocked by a collider, so we need to control for these variables to block the back door path:

{B}
{C}
{B, C}

Example #2

There is one back door path:

A ← B → C ← D → Y

The back door path is already blocked by a collider at C, so we don’t need to do anything. Optionally we can control for these variables to further block the back door path:

{B}
{D}
{B, C}
{C, D}
{B, C, D}

Example #3

There are two back door paths:

A ← D ← B → Y
A ← C → D ← B → Y

For the first back door path, there is no collider. We can control for these variables to block this back door path:

{B}
{D}
{B, D}

For the second back door path, it is already blocked by a collider at D, so we don’t need to do anything. Optionally we can control for these variables to further block the back door path:

{B}
{C}
{B, D}
{C, D}
{B, C, D}

Taking the union of the above two sets of variables gives us:

{B}
̶{̶C̶}̶ controlling for C would leave the A ← D ← B → Y back door path open
̶{̶D̶}̶ controlling for D would create a path between C and B and this creates a new back door path
{B, D}
{C, D}
{B, C, D}

After some cleanup, these are the variables we need to control for to block both back door paths:

{B}
{B, D}
{C, D}
{B, C, D}

When the whole causal graph is not available

No problem, we can use the disjunctive cause criterion, a method to choose variables to control for. The criterion goes like this: control for all observed causes of the treatment or the outcome or both.

If there is a set of variables that satisfy the back door path criterion, then the variables selected based on the disjunctive cause criterion will be sufficient to control for confounding.