How “Rich” is Your Question?

Counteractual
8 min readAug 18, 2020

--

Science is about answering questions, but not all questions are created equal. Some questions are straightforward, like “Who was the author of Hamlet?” Some questions may not necessarily have a correct answer, like “What is the meaning of life?” Science focuses on answering testable questions, but what exactly is being tested? Generally speaking, a scientist must gather information from a data source and infer some conclusion from the patterns found in the data. But what kind of information is convincing enough to answer the question? Often, scientists will irresponsibly try to use surface-level data to answer a deeper question, like in this hilarious example:

Source: www.tylervigen.com

In this case, one may ask the question, “Does eating cheese cause more people to die via bedsheets?” Despite the strong correlation, it would be absurd to claim that cheese is ruining society by invoking the wrath of bedsheets. Examples like this explain why the term “correlation is not causation” is often preached in statistics courses. When a false conclusion like this is made (but maybe less extreme), it usually isn’t due to malicious intentions — it’s human nature to see a pattern and think, “Something must be up here.” We want to dive a little deeper into this idea that some questions are “deeper” than others, and some types of data are “richer” than others. We want to show why knowledge of causality is vital to conduct proper science, and we aim to do this by explaining Judea Pearl’s causal hierarchy (Pearl, 2000).

Layer 1

Pearl’s causal hierarchy refers to three nested layers of knowledge, with each successive layer representing a more general understanding of the world. Layer 1 represents how a human views a world by seeing or observing with their own senses.

Layer 1 is capable of answering questions that ask “What is?” or “How does seeing one thing change my belief of something else?” Perhaps you believe you should bring an umbrella when you see that the sidewalk is wet because most of the time when the ground is wet, it’s raining. This is a sensible response made using layer 1 information, as you were able to draw a connection between wet concrete and rain. In this regard, layer 1 information can be very useful. Would it still make sense if you said that wet concrete causes rain? Fortunately, you don’t need to know whether the relationship is causal in order to make your decision of bringing the umbrella, but layer 1 information is not enough to answer a causal question like this.

When scientists claim that two variables are correlated, they are talking about a layer 1 pattern. Maybe you hear on the news that people who eat more broccoli end up with higher salaries. Does this mean you should stuff your face with broccoli in hopes of getting a raise? Even if you conduct a rigorous scientific study and find statistically significant evidence that people with higher broccoli consumption had higher salaries, you cannot know if the correlation is due to broccoli causing salary, salary causing broccoli, or some third confounding variable causing both. Even if you knew this information, it’s still not clear how strong the causal effect is. We need a richer type of information to figure this out.

Layer 2

Layer 2 represents how a human interacts with the world by doing or intervening with their environment. Layer 2 is capable of answering questions that ask “What if?” or “How does changing something affect something else?” This includes the causal questions we asked earlier that layer 1 was too limited to answer. For instance, maybe you are actually wondering “Does wet concrete cause rain?” Obviously no, but how could you verify it yourself? You could just take a bucket of water, splash it on the sidewalk, and see if it starts raining. You’ll quickly find that there’s no causal effect of wet sidewalks on weather. This is, indeed, a causal observation, and the fundamental difference is that you took matters into your own hands to answer this question. You stepped in and threw that bucket of water yourself. You didn’t just sit back and log in a notebook when the sidewalk was wet and when it was raining.

In practice, we can try to answer layer 2 questions through experimental studies. If you’re keen on finding out whether broccoli causes higher salaries, you can perform an experiment. Find some random participants, force-feed half of them broccoli, and force the other half to abstain from eating broccoli entirely. Check back in a few years to see if the broccoli subjects end up with higher salaries than the non-broccoli subjects. If done properly, then certainly, you would be the expert on the question “Does broccoli cause higher salaries?”

Unfortunately, not every experiment is possible due to financial, legal, ethical, or physical reasons, which is why not every scientific study is an experiment. We won’t be able to perform an experiment to find out whether clear weather raises crime rates since we can’t change the weather. That doesn’t mean we aren’t interested in this question. In this sense, it becomes clear why studying causal inference is important. Not only does it allow us to differentiate between the types of questions we study, it also helps us find workaround solutions to questions that can’t be answered directly.

With all of this said, it’s evident that layer 2 contains answers to a much richer set of questions than layer 1, including answers to all of the questions that layer 1 can answer. This is due to the technicality that having no intervention is also a type of intervention. It may seem that by fully grasping layer 2, we can answer every testable question of interest, which would be a great feat for all of science. However, a deeper dive into Pearl’s causal hierarchy reveals a surprising result: there are interesting questions that even layer 2 cannot answer.

Layer 3

Layer 3 represents how a human reflects on certain actions by imagining other situations aside from the one that actually occurred. Layer 3 is capable of answering questions that ask “Why?” or “What if I had done something else?” While layer 1 focuses on observational studies and layer 2 focuses on interventional studies, layer 3 focuses on counterfactual studies. Readers of the blog may be familiar with this term from the previous blog post, but it’s not necessary to understand this post.

Continuing the same example, let’s say, for some miraculous reason, you find that eating more broccoli does indeed result in a higher salary. Perhaps broccoli has some health benefits that turns you into a smarter employee. However, even if you ate a lot of broccoli and had a higher salary, there could still be other reasons for your higher salary, most of which are not related to the broccoli. Then you might be interested in answering the question “What would my salary be like if I hadn’t eaten broccoli?” This is a question that layer 2 cannot answer. The difference is that by eating broccoli, you’ve committed to a world where you’ve eaten the broccoli. You’ll never be able to witness the world in which you didn’t eat the broccoli. Maybe knowing that broccoli impacts salary might lead you to think that you would have a lower salary otherwise, but you wouldn’t know without seeing that other world.

Unfortunately, most questions from layer 3 can’t be answered through scientific studies since we can’t repeat the past and try something else. Nonetheless, many of the questions we seek to answer involve counterfactual thinking. If Alice originally had COVID-19 and was cured of the disease, why was she cured? Suppose we knew she took a drug that is proven to have a causal effect on the disease. However, the drug affects everyone differently — it helps some people and hurts others. Then, did Alice recover because of the drug, despite taking the drug, or regardless of taking the drug? We may not be able to find out because we can’t see what would have happened if Alice had not taken the drug.

Counterfactual information is important since it can be used to assign blame. If a bridge falls apart seemingly out of nowhere, whose fault is it? Did the construction workers make a mistake? Was the architect who designed the blueprints incompetent? Did the mayor fail to provide enough funding for the project? It may be impossible to find out since we can’t reverse time and change what happened. However, if we could, we could pinpoint exactly what was responsible for the collapse of the bridge.

These examples show that layer 3 contains answers to a much richer set of questions than even layer 2. This includes questions that layer 2 can answer due to the technicality that if you can imagine any alternate scenario, you can obviously also imagine the one that actually occurred. Hopefully, these examples also illustrate the importance of finding ways to answer questions from layer 3. These questions are not limited to science — they occur in everyday-life.

Even if some questions can’t currently be answered, Pearl’s causal hierarchy can help you understand the depth of your question. When you ask a question, whether it’s about broccoli or something more serious, take a step back and think about the details. What layer information do you need to answer the question? Are you asking about an everyday pattern you observed? Are you wondering about the consequences of an action? Are you imagining a hypothetical situation that didn’t occur? Depending on the type of question, you’ll know if you can answer it right away, or if you’ll need to dig deeper for some more compelling evidence.

Many of the posts on this blog will refer back to Pearl’s causal hierarchy. If you would like more details on the topic, we once again recommend reading The Book of Why by Mackenzie & Pearl. For a more technical explanation, see this formal survey for Pearl’s causal hierarchy.

Citations:
Bareinboim, E., Correa J. D., Ibeling D., & Icard T. (2020). On Pearl’s Hierarchy and the Foundations of Causal Inference. Retrieved from https://causalai.net/r60.pdf.

Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd. Cambridge University Press, New York.

Pearl, J. and Mackenzie, D. (2018). The Book of Why. Basic Books, New York.

Vigen, T. (n.d.). [Spurious correlation image between cheese consumption and deaths by tangling of bedsheets]. Retrieved August 16, 2020, from https://tylervigen.com/spurious-correlations

--

--

Counteractual

Blog about causality, authored and maintained by Kevin Xia and Hannah Ho.