Engineering Ideas
Published in

Engineering Ideas

Megapost about causality: the summary of “The Book of Why” by Pearl and Mackenzie and more ideas

Statistics, causality, causal modelling, mediation analysis, counterfactuals, causal representation learning, ethical AI, philosophy of research, the power of abstractions, and systems thinking

The structure of the post

I start with the summary of the book’s ideas and arguments, with subsections about causal modelling, interventions, mediation analysis, counterfactuals, causality and big data, causality and ethical AI, and my takeaways from the book. At the end of the post, I present some extra ideas about causality that are not covered nor mentioned in The Book of Why and make some connections between the study of causality and other fields.

Table of Contents

· The summary of the ideas and arguments of The Book of Why
· Causal modelling
A causal diagram
Causal models are subjective
A Structural Causal Model
· Interventions: the |do(X) operator and the causal effect
Causality is not reducible to probabilities
Causal modelling first, experiment (or study) and data mining second
To find the causal effect between two variables, use controlling judiciously, informed by the causal model
There can be a correlation without a causation
The way the data is collected can introduce bias
Back-door criterion
In research, make assumptions liberally and then discard implausible ones
Researchers can find causal mechanisms in data from observational studies as well as from controlled experiments
Front-door adjustment
· Mediation analysis
Baron & Kenny’s approach to mediation analysis
Pearl’s measures for mediation analysis: natural direct and indirect effects
Detailed analysis of Pearl’s arguments against Baron & Kenny’s approach
Inconsistency of Kenny’s prepositions
Two goals of mediation analysis
Compute natural direct effect to answer questions and improve decisions
Could mediation analysis help refine humans’ intuitive causal models of phenomena?
· Counterfactuals
Interventional queries are about aggregates and distributions, counterfactual queries are about specific instances and samples in the distributions
Notation for counterfactual queries
“The Ladder of Causation”
Humans are the only animals capable of counterfactual reasoning
Human intuition is “organised around causal relationships”?
· Causality and big data
Causal relationships could be induced from the data
The role of big data in causal modelling and inference
· Causality and ethical AI
The role of causal reasoning in understanding free will, consciousness, and agency
Asimov’s laws of robotics would not work because lists of rules never work in AI
Do empathy and fairness follow self-awareness?
Pearl’s views on the risks of AI
· Conclusions and takeaways
Abstractions have power
Make a causal model of an important system you are dealing with
The criteria of sound research in medical or social sciences
· More ideas about causality and connections to other fields
Causal models are stored in human minds in reference frames
Interventions are possible because causal mechanisms are assumed to be independent
The Independent Causal Mechanisms Principle is just an assumption that human brains evolved to use
How does causal modelling relate to the study of complex systems?
Causality and systems thinking
· References

The summary of the ideas and arguments of The Book of Why

Causal modelling

A causal diagram

A causal diagram (also called a graphical causal model) depicts events, objects, features, characteristics, treatments (all of which are generally called variables in statistics; many synonyms are also used, but I’ll stick to the term “variable” in this post), happening (appearing in, characterising, applying to) in some situation (environment, system). Variables are connected with causal links (arrows) that mean that one variable listens to (caused by) another:

Figure 1. A simple example of a graphical causal diagram with “hidden factor” as a confounder between “smoking” and “lung cancer”. © Sanna Tyrväinen

Causal models are subjective

Since the extraction of variables (objects, events, features, etc.) from the background is an operation performed subjectively by an intelligence (an intelligent agent), the causal diagrams are necessarily subjective, too.

A Structural Causal Model

A structural causal model (SCM) is a set of equations that corresponds to some causal diagram, where every variable is equated to (i. e., modelled as) a function of its direct causes and an extra random variable for modelling factors that are absent from the diagram, either because they are unknown, intentionally omitted, or unobservable. Each function has its “own” extra random variable, and these random variables are independent of each other. For example, the set of equations corresponding to Figure 1 may look like this:

Interventions: the |do(X) operator and the causal effect

One of the two main types of questions that researchers could ask about the causal model of some situations is a so-called interventional query.

  • “How the probability of having a heart attack in the next five years will change in so-and-so group of people if we prescribe them taking so-and-so medicine every day?”
  • “How much the sales of a product will change if we discount it by 20%?”

Causality is not reducible to probabilities

Causality can be “defined” as P(Y|do(X)) != P(Y) but this is a tautology of the causal link on the causal diagram, given that |do(X) operator "surgeons" the causal model.

Causal modelling first, experiment (or study) and data mining second

Pearl describes how researchers should extract knowledge from data: specifically, they should start with creating and analysing a graphical causal model before manipulating the study (or experimental) data. Moreover, researchers should ideally analyse the causal model before even conducting an experiment or an observational study because the causal model could help them to design a better study (experiment), or reveal that their research question is unanswerable, no matter how much data they collect, or that they don’t need any additional data because they can answer their question by “transporting” (translating) information from a combination of earlier studies or experiments, none of which was done in exactly the same context as is of current interest.

To find the causal effect between two variables, use controlling judiciously, informed by the causal model

The correlation between the treatment (program) and the effect may vary greatly, or even disappear and then reappear again, depending on the set of controlling variables.

The causal model of the Monty Hall game. Image by the author.

There can be a correlation without a causation

The Monty Hall paradox is a counterexample to the Reichenbach’s Common Cause Principle: “If two variables X and Y are statistically dependent, then there exists a variable Z that causally influences both and explains all the dependence in the sense of making them independent when conditioned on Z.” However, as one can see, door chosen by the player originally and car location are statistically dependent but don’t have a common cause.

© Tyler Vigen

The way the data is collected can introduce bias

Aaron Roth and Michael Kearns write in The Ethical Algorithm that researchers shouldn’t access data too much before forming a hypothesis because they can p-hack without intending to. The mechanism of this bias could be the same as in the Monty Hall paradox: if the way the data is collected (i. e., the design of the study) is affected by both the cause and the effect variables, then an additional correlation path is opened between these variables.

Back-door criterion

Researchers should look at the causal diagram to determine whether there is a set of variables for which the observational or experimental data could be controlled to obtain the causal effects between the treatment and the effect (in other words, whether it is possible to compute the causal effect at all). The blocking variables (i. e., the variables for which researchers should control their data) should block all so-called back-door paths (detailed explanation of what does this mean is beyond the scope of this review) between variables X and Y while not blocking any front-door paths (i. e., the paths consisting only of forward causal arrows) between X and Y. A set of such variables is said to satisfy the back-door criterion.

  • Find ways to obtain the data on these variable(s), which can range from conducting an additional survey among the patients to developing new scientific knowledge or technology for measuring or estimating the variables that were unobservable before, for example, activity of neurons in the brains of people in casual settings for neuroscience and psychology research.
  • Make simplifying assumptions (at the risk of being wrong): for example, that an effect of one variable on another is small and can be neglected. If researchers choose this path, ideally, they should supplement their results with a sensitivity analysis (more on this below).
  • Use front-door adjustment instead, if applicable (more on it below).

In research, make assumptions liberally and then discard implausible ones

Researchers can find causal mechanisms in data from observational studies as well as from controlled experiments

Pearl suggests researchers should be bolder in their claims, which is related to the previous idea of “bold hypothesis”:

Front-door adjustment

Pearl presents the method called front-door adjustment for estimating the causal effect of one variable on another when these variables have unknown or unobserved confounders which makes it impossible to estimate the causal effect the “traditional way”, by controlling for confounders. The method is also applicable only in special situations (there should be an unconfounded and observable mediator variable on the causal path between the treatment and the effect variables), but the point is that it is impossible to demonstrate why the front-door adjustment is a valid method without reaching to the notions of causality and causal mediation.

Mediation analysis

Only after a researcher uses the notions of cause and effect, they can ask whether a treatment causes an effect directly or indirectly, that is, mediated through the effect of the treatment variable on other variables different from the effect variable. This, in turn, will help a researcher to refine their causal model, thus beginning a virtuous loop of discovery of causal relationships. Pearl poses that these relationships constitute the bulk of the whole scientific knowledge, as noted in the section “Causal models are stored in human minds in reference frames” below.

Baron & Kenny’s approach to mediation analysis

Baron and Kenny developed a famous procedure for mediation analysis using multiple regression over variables X (cause), Y (effect), and the mediating variable M (the procedure could be straightforwardly generalised for the situations with multiple mediators). Baron & Kenny essentially defined the regression of Y on X controlled for M as direct effect, and posit that total effect = direct effect + indirect effect. Note the absence of the word “causal” in these labels. Pearl writes that Baron & Kenny’s method estimates noncausal mediation (The Book of Why, p. 325), despite David Kenny himself stating the exact opposite on his website:

Pearl’s measures for mediation analysis: natural direct and indirect effects

For mediation analysis, Pearl suggested using notions of natural direct effect (NDE) and natural indirect effect (NIE) which are defined as follows:

Detailed analysis of Pearl’s arguments against Baron & Kenny’s approach

Pearl uses various vague and metaphorical phrases to express his opinion of Baron & Kenny’s approach. I italicised these characteristic phrases in the quotes below.

Inconsistency of Kenny’s prepositions

In fairness, here I play the role of Kenny’s advocate because in [4] his prepositions are sometimes inconsistent with each other. For example, he writes that their (with Reuben Baron) procedure is the “steps in establishing mediation”, which is hard to interpret in any other way than that these steps can be used to determine whether there is mediation (they can not, as noted above). But just on the previous line, he writes “We note that these steps are at best a starting point in a mediational analysis.”

Two goals of mediation analysis

To better understand the difference between Baron & Kenny’s and Pearl’s methods for mediation analysis, I think it’s essential to start with the purpose: why researchers perform mediation analysis in the first place?

Compute natural direct effect to answer questions and improve decisions

For the first goal, Pearl gives examples of such concrete, practical counterfactual questions. In American court, discrimination is taken to mean natural direct effect:

Could mediation analysis help refine humans’ intuitive causal models of phenomena?

Apart from the “computational” (instrumental) purpose described above, mediation analysis is also supposed to have another function: help communicate “computed” decisions to humans, as part of the explainable AI program.

Counterfactuals

The second main type of causal inference question is counterfactual (the first type was interventional queries).

  • “Given that the patient has had a heart attack, what is the probability that they wouldn’t have a heart attack if he started to take so-and-so medicine one year ago?”
  • “Given that the customer didn’t buy the product, what is the probability that he would if he was offered a 20% discount?”

Interventional queries are about aggregates and distributions, counterfactual queries are about specific instances and samples in the distributions

The terms “interventional query” and “counterfactual query” might suggest that the difference between them is that interventional questions are concerned with the future, while counterfactual questions are concerned with the past, even if the “past” in some imaginary world, which itself might be in the “future” (counter-fact, i. e. something against what have happened in fact). This is wrong and caused a lot of confusion for me while I was reading the book.

Notation for counterfactual queries

In the book, Pearl introduces separate notation for counterfactual queries but doesn’t follow it strictly, sometimes he just uses do-notation to represent counterfactual queries (as well as the authors of [2]). In do-notation, counterfactual queries can be written as P(Y|u, do(X = x)) where u abbreviates all specific values u_x, u_y, etc. assumed by random variables U_X, U_Y, etc. for the concrete unit in the question.

“The Ladder of Causation”

Pearl organises simple statistical queries, P(Y|X), interventional queries, P(Y|do(X)), and counterfactual queries, P(Y|u, do(X = x)) into what he calls “The Ladder of Causation”.

Humans are the only animals capable of counterfactual reasoning

Pearl also attaches to “The Ladder of Causation” some anthropological and biological propositions, such as that among animals, only humans are capable of counterfactual reasoning.

Human intuition is “organised around causal relationships”?

The knowledge conveyed in a casual diagram is typically much more robust than that encoded in a probability distribution. This is the reason, Pearl conjectures, that human intuition is organised around casual, not statistical relations.

Causality and big data

Causal relationships could be induced from the data

In the section “Causal modelling first, experiment (or study) and data mining second” above, I’ve relayed Pearl’s idea that data mining, statistics, and even deep learning alone (without an assumed causal model) are insufficient to estimate the causal effect between a pair of variables.

The role of big data in causal modelling and inference

Pearl suggests the following ways to leverage big data for causal modelling and inference:

  • With big data, researchers can search for interesting patterns of association and pose more precise interpretive questions.
  • The sheer quantity of data samples helps to overcome the curse of dimensionality in computing certain causal queries. In fields where there is a big variance in symptoms and histories (such as personalised medicine), units (e. g., patients) could be first clustered based on some similarity metric, and then interventional queries answered based on the data samples in the cluster: consider P(stroke|do(age, weight, genetic disposition, accompanying illnesses)) query mentioned in the section “Interventional queries are about aggregates and distributions, counterfactual queries are about specific instances and samples in the distributions” above.

Causality and ethical AI

The role of causal reasoning in understanding free will, consciousness, and agency

Unlike on the subject of anthropology, Pearls makes very cautious statements about the role of causal reasoning in the “holy grail topics of artificial intelligence”, namely free will, consciousness, agency, responsibility, ethics, etc. I think these are all uncontroversial statements:

Asimov’s laws of robotics would not work because lists of rules never work in AI

Do empathy and fairness follow self-awareness?

Pearl writes:

Pearl’s views on the risks of AI

Conclusions and takeaways

I was looking for a book that would introduce me to practical applications of statistics and yet was not very technical, i. e. would not submerge me into the subtleties of specific probability distributions and statistical procedures. The Book of Why is just such a book.

Abstractions have power

In The Book of Why, Pearls gives several examples of situations when researchers couldn’t effectively answer certain questions because they didn’t use the language of causality, thinking about causality as a mirage that is reducible to probabilities:

Make a causal model of an important system you are dealing with

Examples of such systems are your own motivation (mood, health, quality of sleep, etc.), an organisation you are leading, or a system you are building.

The Amazon Flywheel. © Jeff Bezos
The causal diagram of motivation. Image by the author.

The criteria of sound research in medical or social sciences

Pearl teaches that any research in epidemiology, sociology, psychology, macroeconomy, and other medical and social sciences which back up their “findings” with statistics but don’t make clear the causal assumptions that they use and don’t perform corresponding sensitivity tests regarding these assumptions should be taken with a big grain of salt.

More ideas about causality and connections to other fields

Causal models are stored in human minds in reference frames

The idea that jumps out at me from The Book of Why is that the causal models are stored in human brains as reference frames (my speculation: mental reference frames of concepts have at most seven features), introduced in Jeff Hawkins in A Thousand Brains.

Headache causes. Image by the author.
© Sanna Tyrväinen

Interventions are possible because causal mechanisms are assumed to be independent

When intervening in the causal diagram and the structural causal model of the system (situation, environment, etc.), why one is allowed to change just one or several functions (f_X) while leaving all other causal relationships and the functions for other variables intact? If one intervenes into one variable, for example, making patients take some medicine, what are the guarantees that some other things, not just the values of the variables causally dependent on the “Treatment” variable, but maybe relationships between other variables don’t change as well?

The Independent Causal Mechanisms Principle is just an assumption that human brains evolved to use

It’s important to remember that the Independent Causal Mechanisms Principle is merely a constraint on what a causal model is and not a universal principle of nature. Therefore, the principle shouldn’t be confused for being a physical (or metaphysical) feature of causality. However, I wouldn’t be surprised if there was a way to derive from first principles (physics, entropy) that some decomposition into independent causal mechanisms must be possible in most situations in some precise mathematical sense. Perhaps, this has even been done already because there seems to be quite a lot of papers connecting causality and entropy in various ways.

How does causal modelling relate to the study of complex systems?

When a causal diagram contains loops which include the treatment and the effect variables, or worse so, if “everything depends on everything”, unless researchers are willing to sever all these loops by assuming that some causal links are weak and ignoring them (as noted above in the section “Back-door criterion”), the researchers should conclude that the system they are dealing with is complex (rather than merely complicated). All the methods of causal inference described by Pearl (back-door adjustment, front-door adjustment, counterfactual queries, mediation analysis) can’t yield provably correct results in such situations. These methods could still be useful, though, especially for sensitivity analysis (as described above), that is, estimating how complex the system really is.

Causality and systems thinking

There are several fairly obvious relationships between the practice of causal modelling and systems thinking.

References

[1] M. Eichler. Causal inference with multiple time series: principles and problems (2013)

--

--

Writings about software, systems, reliability, and data engineering, software operations, peopleware, philosophy, etc. by Roman Leventov. Originally published at https://engineeringideas.substack.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store