Representing Causal Knowledge and Making Interventions

Published in

Inspired Ideas

9 min readMay 11, 2021

In this post we will go through one of my favorite techniques for building intelligent systems — causal thinking and modeling. Thinking in terms of cause and effect comes in very handy when trying to model the world and tease out answers from the model itself. Perhaps more importantly, it allows me to communicate seamlessly with domain experts and the rest of our team.

I work on the Elsa Health team where we build decision support technology for healthcare providers. Causal relationships are everywhere, and in healthcare it is crucial to be clear and upfront about how factors/variables are related and connected to each other in a very simple and intuitive way.

This article is written as introductory material to those looking for alternatives to data hungry machine learning algorithms and overall curve fitting. I don’t expect to take a deep dive into any specific concepts, but references will be provided for those interested.

TLDR;

How we use prior knowledge to build causal models:

Explore existing knowledge through reviewing literature, experiments, documentation, interviews and any other knowledge bases we can think of.
Synthesize all our findings into a drawing that shows how the ideas from our research are connected to each other. For this we use the DAG, and can sometimes end up with multiple DAGs based on conflicting information.
Review your DAG with domain experts, iterate, review, iterate, review, iterate, …
Write software to represent our DAG and allow us to easily make simulations and interventions.
Simulate data from our causal model, review simulation results, iterate, review, iterate, review, iterate, ….

A model is a simplification of reality.

Setting the Scene

With the recent (amazing) democratization of Machine Learning technologies and resources, the barrier to entry is lower than ever before.

Typical Machine Learning techniques require a lot of (relevant) data to train in order to make good predictions. This is rather intuitive because just like you and I need to see/try something a few times before we can do it exceptionally well, the machine also needs a few attempts to really get the hang of it.

The (overly simplified) cycle goes something like this:

Collect and clean your data
Throw the data at some algorithms, or maybe throw some algorithms at the data instead
Make some adjustments to your algorithms through adjusting your parameters and hyper-parameters
Pick the (often black box) model that performs the best, and use that one

Nothing about this is objectively wrong, however, it is very easy to fall into a loop where this is the only strategy we pick when faced with a new problem without considering the options that exist.

With that said, I have seen many applications of machine learning that I think could be better approached with a cause and effect mindset, especially in fields like healthcare, insurance, and epidemiology where causality must be explicitly demonstrated and shown.

A flaw/deficiency of statistical/machine learning that is often pointed out is its inability to account for the causal relationships inside and outside of the data. These causal relationships are what created the data we are observing, and in MANY cases, these relationships — which we use as prior knowledge — are well known, documented, and even proved several times.

The case for prior knowledge

TLDR; We try to use existing knowledge as much as possible to identify the causal relationships between factors (P(Y | Q, do(X = x))), which we think is more beneficial in our use-cases than learning just the correlation from often scarce data (P(Y | Q, X) and P(Q, X, Y))

With the exception of a few techniques, such as Bayesian updating and more recently transfer learning, many learning algorithms start from scratch every time they are trained. This means when I am training a logistic regressor or neural network on my collected data, it assumes we know nothing about the problem space and so it attempts to learn only from the data.

Simple Scenario: Take for example building a model to classify whether a patient in a specific area has malaria or not. To do this we would collect data on patients with and without malaria, and learn either the probability of malaria given signs, symptoms, demographic and sex — denoted as P(malaria | signs, symptoms, demographic, sex) or the probability of seeing a patient with these signs, symptoms, demographic, sex and malaria together, denoted as P(signs, symptoms, demographic, sex, malaria), assuming we collected the data :

Sample data collected on malaria patient presentation

After training our model on this data we can end up with a decent predictor that has learnt the relationship between our independent variables and our dependent variable.

Alternatively, we could do the following:

Spend a few hours researching and reading through literature like [1], [2], [3], etc
Contact a few doctors/pediatricians/researchers and interview them on what they learnt in school as well as what they see on the ground.
Construct a simple causal model that represents the variables/factors you discovered and start from there. An example of a simple model can be seen in the Representing Knowledge as graphs section below.

A few of the advantages of this second approach are:

We are able to quickly learn new things.
We are not crippled by unavailability of quality data to train our algorithms.
We can avoid biases in data, in our case it was healthcare providers making incorrect diagnoses due to unavailability of testing and investigation tools.
The model is quite transparent and clear to read and understand for all the stakeholders, future researchers, and regulatory bodies — this is a “white box model”, meaning I can see how it comes up with its decisions/recommendations.
Probably most important, we are able to incorporate causal information very explicitly and say Malaria causes fever 95% of the time for children under 5.

The case against prior knowledge

There are a few potential downsides to incorporating prior knowledge, which could be subjective, into your new model that will interact with the world. Two of them are:

Bias — While bias is plentiful in data, and explicitly drawing your causal model or DAG will clearly expose any biases you have, some bias and opinions can still make their way to the final model depending on the creator of the model, and the reviewers he has for the models he constructs
Incorrect Knowledge — Many things are still unknown, and sometimes even experts have incorrect information/knowledge. While this becoming more and more rare, it still happens and is something to be careful of.

Note: A counter argument for “Experts can be wrong too” when using their knowledge and research to build your models is that often it is better to make the same call an expert in their field would make even if it is later proved wrong.

Representing Knowledge as graphs

When visualizing our causal model, its easy and intuitive to draw it out as a Directed Acyclic Graph, or better known as a DAG for short.

A DAG is a type of graph with nodes (called Vertices) and arrows between the nodes (called Edges or Arcs) which point in the direction of causality/influence. A simple example of this is shown below where we represent the graph G with two vertices a and b and one edge from a and pointing at b symbolizing that a has an effect on b. This can be further explained as changing a results in a change in b but changing b does not result in a change in a.

A key thing to note here is that statistical/machine learning methods are more interested in the correlation between the two nodes in the graph, meaning that there is no difference between a causes b and b causes a.

To put this in real terms, a model that learns correlations will assume that smoking causes lung cancer as much as lung cancer causes smoking. The arrow is very important!

Below is a simplified example of a DAG constructed from research and conversations with a few clinicians:

As you can see from the DAG above, its clear what factors affect what other factors and from our experience interacting with field experts and governments, this level of transparency is crucial to acceptance because it lends itself very well to auditing and criticism.

A motivational example often brought up Judea Pearl, the author of the “Book of Why”:

As the weather changes before a storm, we can see that the readings of our barometer change, but this does not mean when I go outside and manipulate the barometer I will affect the weather and potentially reduce the risk of a storm.

This is a powerful idea in a clinical setting, if I somehow increase the temperature of a child such that they report a fever, it does not affect the probability of them having an infection.

While causal structures allow us to express causality, the do-Calculus (also known as Pearls’ Calculus) is the means in which we interact with and manipulate our model. Using this method we can ask what-if questions.

From the DAG of malaria above, we can ask interesting questions such as:

What will the Malaria rates look like if we secure funding to provide 80% of the population in a given area with mosquito nets — P(Malaria | do(give 80% of population mosquito nets)
What percent of Malaria patients with a fever have an intermittent fever?

To put it in more abstract and general terms, in a Graph G with edges a, b, and c such that a -> b -> c, we can "intervene" and ask questions such as:

What will c be if I manipulate b to be true, denoted as P(c | do(b = true))
What will b be if I manipulate c to be false, denoted as P(b | do(c = false)). Spoiler alert - nothing will happen to b because our causal assumptions dictate the direction of influence is not c -> b, and this is an example where statistical models would get tripped up because they are based on correlations.

For the best explanation I have encountered on Causal Inference and do-Calculus the article https://www.inference.vc/untitled/ is a must read.

How we would implement this

Let’s create a simple model in the Julia programming language using the Omega package which makes it really easy and intuitive to do probabilistic programming and causal modeling. For more information on Omega.jl, I highly recommend the video:

For this demo to work I am assuming you have installed Julia on your computer and have added the Omega.jl and the UnicodePlots packages for causal modeling and plotting respectively.

For this example we will just create the model for Season, Use of mosquito net, fever, headache, intermittent fever and Malaria:

Importing libraries and defining first 2 variables that don’t depend on anything else

Defining the malaria and fever nodes and their dependecies

Define headache and intermittent_fever and their dependencies

That is all we need to define the causal model in Julia. To try it out, we can do a few simulations of the data:

Which should print out the following output:

With the above code, you are running 10,000 simulations of each case and your output will be:

Hopefully its clear to see that our causal model makes sense in that forcing people (in the simulation space) to not use the mosquito net resulted in more malaria positive simulations while forcing the mosquito net usage resulted in less positive malaria patients.

From here, we can use the causal model as a generative model to help us classify patients suffering from different conditions by comparing the patient we are interested in classifying to the generated samples under a certain condition.

I plan on writing more on how we use causal models to solve our own problems in the near future!

Note: Many of the examples and scenarios above have been simplified in the attempt to make things intuitive for students and the curious alike.

To learn more about our work follow us on twitter: @3210jr , @elsahealth!

References

Judea Pearl — Do Notation : https://arxiv.org/abs/1305.5506
Judea Pearl — Introduction to Causal Inference : https://ftp.cs.ucla.edu/pub/stat_ser/r354-reprint-corrected.pdf
Judea Pearl — Book of Why: https://www.amazon.com/dp/B075DCKP7V/ref=dp-kindle-redirect?_encoding=UTF8&btkr=1
Ferenc Huszár — ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus
Omega.jl — https://github.com/zenna/Omega.jl
Julia — https://julialang.org/
Improving the accuracy of medical diagnosis with causal machine learning — https://www.nature.com/articles/s41467-020-17419-7
Causal and Counterfactual Inference — https://ftp.cs.ucla.edu/pub/stat_ser/r485.pdf