Featured
Causality in Environmental Epidemiology
Environmental epidemiology is the study of distribution of diseases whose causes are to be found in our environment. By environment, we mean anything that exists “outside” of human body, and is influenced by us, humans. For example, consider how thousands of years ago, when we as humans started agriculture, we set fire to forests and cleared lands that we could then use for crop production to feed our families and produce food for our society. This act resulted in fires, heat, dust, and affected the environment. In turn, this resulted in effects on our health states.
The question we are asking here is this: “How do we know that in health, X is a cause of Y”? Going back to the instance of forest fire, clearing, agriculture, and human health issues, say Asthma, we may ask the question, “How do we know that people who had Asthma, that was caused due to the forest fires?”. The answer to this and any other question on cause and effect in health depends on how we frame them.
Now in real life, we will always ever see “associations” or “correlations”, we will never experience that X is a cause of Y. We will need to infer that X is a cause of Y in different ways. In this article, we will use four different ways to answer this question. First, we will use “criteria” to test that an association is indeed a “real association”, and then from the point of view of real association, we will argue that the nature of that association is one of cause and effect. We will then argue that in health, there can never be only one cause of one effect, causes always go together. We call this “component cause model”, so we will argue that multiple causes combine together to cause an effect. We then argue using graph theory that we can study cause and effect. Finally, we will use the logic of counterfactual theories to argue that X is a cause of Y. The last two approaches, that is graphical theories or using directed acyclic graphs and counterfactual theories of causation are intuitive.
Is this association for real?
We begin with the premise that we will never actually observe a cause and effect relationship but we will need to infer that association. We will exaine and build our story with the case of exposure to Environmental Tobacco Smoke and development of Asthma in children. We will refer to a paper by Gold (2000) that discussed the various pieces of evidence and mechanisms [1]. Here is a diagram from the paper that shows the various mechanisms by which exposure to environmental tobacco smoke can result in childhood asthma
You can see that the model is quite complicated and it involves genetic factors and environmental factors taken together, but in our case we will focus only on environmental factors and we will ask if environmental factors can be causal? How do we know? The first step is to frame a theory and from the theory we will derive the hypotheses. Our theory is this: Exposure to Environmental Tobacco Smoke can cause Asthma in children.
With this theory, we frame the following hypotheses:
H1: Children who are exposed to environmental tobacco smoke compared with those who are not exposed to environmental tobacco smoke will have a higher risk of asthma
H0: Children who are exposed to environmental tobacco smoke will have about the same risk of asthma as those children who are not exposed to environmental tobacco smoke.
The first statement is derived from our theory that exposure to environmenal tobacco smoke causes asthma in children. The second hypothesis is called null hypothesis because this hypothesis leaves the possibility that exposure or no exposure the risk of asthma will remain the same. something like as follows:
Asthma risk with exposure = Asthma risk with no exposure
Hence,
Astham risk with exposure - Asthma risk with no exposure = 0
With this statement, data are collected and the data are tested to whether the null hypothesis (H0) can be disproven. Before a study begins, we do as follows:
| Condition | H0 True | H0 False |
|-------------------|---------|----------|
| Reject H0 | Type 1 | Correct |
| Fail to reject H0 | Correct | Type II |
What’s going on in the above table is this:
- We first set up the null hypothesis (see above)
- Then we plan to set up a test based on the data we will obtain from children in real world (in this case)
- Then we set up two scenarios as we shown in the above table. In the first scenario, we may be able to obtain sufficient data that will let us reject the null hypothesis. But if we do that, there are two possiblities (a) we are falsely rejecting the null hypothesis (this is where we may go wrong) or (b) we may be correctly rejecting the null hypothesis (this is the power of our study). Similarly, in the second scenario, we are unable to reject the null hypothesis and likewise, we may be right or we may be wrong. If we falsely fail to reject the null hypothesis and vote in favour of the null hypothesis, we will be committing a beta error or Type II error. Researchers typically set the alpha error at 5% probability (hence 0.05) and the beta error at either 20% (or beta = 0.20) or about 10% (beta = 0.10)
Then, once our study is complete and we want to analyse, we do the same. Under the assumption that the null hypothesis was true (or that the “no effect” was the truth), we the probability of finding what we have found. If that probability is very low (say less than 5% hence p = 0.05), then we will conclude that the null hypothesis does not stand and our alternative hypothesis, derived from our theory holds true. This is the theory of p-values you get to see in papers.
Let us illustrate this with one of the several studies in the paper: we will name this study Evans study, conducted in 1987 by David Evans et.al (2). This is what they did in the study:
- Evans studied 191 children in New York City where the children were either referred to the Emergency Rooms with Asthma symptoms or were not, and the authors also obtained data on their exposure to passive smoking in homes. Then the authors conducted regression modelling on the risk of Asthma related emergency room attendance. They wanted to find out if passive smoking was associated with Asthma related ER visits and if so, what was the magnitude and if this association was a statistically significant association. Here is their main table where they have reported their findings:
As you can see in this table, passive smoking had a beta coefficient of 1.34 or rather +1.34. This means, that for every child that was exposed to passive smoking in the household had 1.34 more visits to the ER due to Asthma. Written in another way, one can say that for every 10 children that were exposed to passive smoking in the household, expect 13 more visits to the ER due to Asthma. Was that statistically significant? The answer is yes, because we see a p-value of 0.008. What this means is that, the probability that would not be the case (that is, there would be no additional visits to the ER due to Asthma between those households that had passive smoking and those households that did not) is as low as 0.008 or one in 125, which says it has very little chance that this could be explained by chance alone.
Could it be due to bias?
In order to establish that an association is real, we must be able to (a) rule out play of chance (as we have done in case of passive smoking or ETS and Asthma above), (b) eliminate all sources of biases in the observations and study, and (c ) control for confounding variables. So let’s take a look at biases.
Biases, in epidemiology, refer to a process of systematic error that occurs in the observation or analysis of the results. Let’s take the example of another study, this study is by Parvin Mirimiran and colleagues conducted in Iran where they wanted to study if nitrates and nitrites in food were risks for non-alcoholic fatty liver disease. Now usually fatty liver disease occurs among those who consume alcohol but this is a case where fatty liver disease occurs but the cause is not consumption of alcohol. This condition occurs in about one in four people over the world and the authors considered that consumption of nitrate containing food might be protective against non-alcoholic fatty liver disease as opposed to those food items that were low in nitrate contents. This was in part due to the fact that food items that were high in nitrate were mostly different types of vegetables and fruits so people who consume higher quantity of fruits and vegetables, the authors argued, were also likely to suffer from non-alcoholic fatty liver disease.
In order to study this relationship, the authors conducted a case control study, that is, they had dieticians who asked 225 people with non-alcoholic fatty liver disease and 450 people without this condition about their diet and physical activity using a food frequency questionnaire that had 87 items in it. The respondents were identified and they provided answers to these questions. The authors also estimated from the responses from the participants their approximate nitrate content of the food and on this basis, they conducted their study on the association between nitrate content of dietary items and risk of non-alcoholic fatty liver disease in their participants. And indeed, in their study, they found that individuals who consumed high nitrate containing food items were less likely to have non-alcoholic fatty liver disease or they were at lower risk compared to people who did not consume such food items.
What do you think are the problems with this approach? First of all, they did not have a way to know that the people who were responding to the questionnaires using this food frequency questionnaire indeed consumed this for the rest of their lives. Secondly, if the people were told that they had non-alcoholic fatty liver disease and the investigators were trying to find consumption of fruits and vegetables might be protective of this condition, and particularly, if the participants who were in the case arm, that is, those who knew they had non-alcoholic fatty liver disease and those who knew that they did not have the condition, its not inconceivable that the responses might be different for the two groups. It is believable that those who had the condition might be tempted to understate their fruits and vegetable consumption and those who were in the control arm might be tempted to overstate their fruits and vegetable consumption. Thirdly, it is also possible that as the investigators themselves were not “blinded" to the conditions or who was who, they might be tempted to weigh the nitrate concentrations of those people who did not report high fruits and vegetable consumption than the others.
All these are examples of different types of biases. There are possibilities of reporting bias from the participants who may be tempted to under-report or over-report their exposures depending on what they believed might be the cause of their conditions. This is referred to as response bias. Likewise, the investigators themselves are likely to report differntially the exposures based on what they believed might be the risk factors of the disease condition or the outcome. The point to note though is that, you cannot control for biases once the data are obtained. You have to eliminate all possible sources of bias. This is why experimental study designs such as randomised controlled trials are so powerful study designs particularly when they are properly blinded. But in the absence of any such experimental evidence or randomised controlled trials, and no blinding, there is always possiblity of either response or selection bias and you need to be mindful of these possiblities.
There is still a third possibility that we must consider, this is referred to in the epidemiological literature as “confounding” variable. We will later in this paper refer to the graphical way of dealing with confounding variables, but for now, confounding variables are those variables that are (a) associated with the exposure and (b) associated with the outcome, and ( c ) they DO NOT come in any linkage between the exposure and outcome. An example is in order. Suppose you want to find out the association between passive smoking and asthma in children. We may consider low socioeconomic condition. See Figure 1 for an explanation
What’s going on in the above figure is this:
- Children who grow up in households that have low socioeconomic status are likely to be living with parents who smoke cigarettes or at least in many developing countries are likely to live in households that have very poor quality of fuels for their cooking, so these children are more than likely to experience passive smoking
- It is also true that children who live in poor living conditions and poor households are more likely to be suffer from asthma and indeed end up being seen in ERs.
- But note also that we cannot say that passive smoking leads to poverty or poor household in some way. Therefore although we can speak of poverty as an alternative explanation, we cannot argue that it is so because poverty is a result of someone being exposed to passive smoke in the household. This aspect is important.
- So when we link passive smoking with Asthma in children, these factors need to be kept in mind as what we are seeing if we are seeing that passive smoking leads to increased asthma related ER visits is that, we are actually experiencing the effects of poverty on these children. Unless this is “adjusted” for, we are not likely to be seeing the true effects. Therefore, if you were to read a paper where the authors were to suggest that there is a cause and effect association between passive smoking and childhood asthma, look for whether the authors have sough alternative explanations and one of them is the presence or possiblities of confounding variables.
Confounding variables are always based on your supposition or theoretical understanding, although it is common to use confounding variables or “potential confounding” variables as they are often referred to, to be statistically evaluated whether they are statistically associated with both exposure and outcome variables. This does not necessarily have to be so, but at least there is enough ‘substantive’ reason for us to argue that a variable might be a confounding variable and therefore these are tested.
The way to deal with confounding variables are either to altogether eliminate them from the analysis or from the scope of the study. For example, for the passive smoking and asthma, it might be possible to confine the analysis ONLY to very poor households. Alternatively, one can stratify the sample of individuals. So it might be possible to stagger the sampling of subjects to different levels of socioeconomic status and then pool the results together using statistical pooling. Yet another way might be to use a random allocation of the participants to one or the comparative group. However, this is only possible if we are dealing with an experimental study design such as randomised controlled trial. Where this is not possible or none of the above approaches will work, we need to conduct multivariable analysis. Indeed, this was done in case of the study that David Evans and colleagues conducted in New York City based hospitals, where they had children from different parts of the city and they statistically adjusted for several variables and conducted a regression analysis.
So in summary, once you have eliminated the play of chance by selecting a certain number of participants based on your alpha and beta errors and frequency of occurrence of the exposure variable and the effect size that you consider as important for your study; once you have eliminated practically every source of bias (such as by enforcing blinding, and impartial training of the investigators or adopting an experimental study design, or a prospective study design), and you have adjusted for controlled for all possible confounding variables, you can state that the relationship or association you have observed in a study is for real.
But is that relationship causal?
Even if the relationship is statistically sound, and it meets all the conditions of bias and confounding; in other words, even after we know for sure that the association could not arise due to chance, or the positive or negative association could not occur because the investigators were reasonably biased, or that the investigators took care to control for confounding variables, we are still not sure if this association might qualify for to be a causal relationship. Whether a relationship is causal or not has to do with subjective understandings.
That answer is not easy, but in 1965, following a long trail of scientists and philosophers who came before him, Sir Austin Bradford Hill, arrived at a set of what he calls “viewpoints” but later day epidemiologists refer to as “criteria”, rather, “Hill’s Criteria” (3)
In 1965, when Sir Austin Hill delivered his Presidential Address to the British Occupational Hygiene Society, he was already well-known for his work with another knighted biostatistician, Sir Richard Doll, on their work on the risk of cigarette smoking and heart diseases. In his address, Sir Hill identified nine conditions, what he termed as “viewpoints”,
You can see Sir Austin Bradford Hill discussed nine what he called viewpoints and we now call them criteria. Of all these, he put a great emphasis on “Strength of Association” for a reason. He gave the example of cigarette smoking and lung cancer, a work he did with Sir Richard Hill, and the risk estimate they found had risk ratio of about 9. What this means is that smoking alone explained 8/9 = 89% of the variation in the effect size they observed in the study, so it’d be really hard to find another piece of evidence that can trump it. Another example was that of Sir Percival Pott’s discovery of the risk of scrotal cancer among the chimney sweeps, which had a risk ratio of 20, which again means that 19/20=95% of the cancer incidence would be explained due to their exposure to chimeny soot particles. We will later see that we can find exceptions to almost all of these so-called criteria, so none of these are sacrosanct, but it is a usual practice to discuss these features when we discuss epidemiological studies.
So with statistical significance, ruling out bias, and controlling confounding and finding many of the nine criteria being satisfied gives us a good standing of deducing cause and effect indeed. But the story is still far from clear, because most of the times, causes are not a single cause for a single effect, causes interact with each other. This is where a component of multiple causes come into the picture.
Consider the image below.
In 2011, Ken Rothman et.al. wrote a paper where they discussed the theories around causal thinking (4). The paper is eminently readable, and in this paper, they introduced the concept that all causes are multifactorial. This means you cannot just have one cause and one effect. Although Sir Hill discussed the case of specificity where he meant that quite often one would find one cause for one disease such as the case with infectious diseases, but in real life this is not often the case. Rather it is always opposite where you always find multiple causes. In this light let us explain the above figure.
You can see that in the above figure we have three circles, and each circle is divided into several “arcs” bounded by lines, or like slices of a pie. Focus on the circle on the left hand side. This pie and its slices depict one causal mechanism. What does it mean? Suppose a case of lung cancer is not only caused due to cigarette smoking, but cigarette smoking, exposure to Asbestos dust, nickel dust, or diesel fumes, all of these are involved in “causing” the case of lung cancer. But we will not know if we were to consider only one cause. This is why we see that each of the circle has several “arcs”. Each “arc” or slice of the pie is referred to as a component cause, which the diagram states as “Single Component Cause”. If we note these three circles, we see that among all the component causal models of the same disease (or health related state), one cause appear all the time, and that is the letter “A”. The letter “A” which is one of the causes of the component causal model is termed as the “necessary” cause. Each of these circles are referred to as “sufficient” causes, because the combination of all of these individual factors close the circle together.
What sense do we make of this structure? Consider the table below, produced from the same paper by Rothman et.al (2011).
Shown above is a table taken from the paper where they report results of head and neck cancer where there are two competing causes: smoking and alcohol consumption. If you note carefully, you will that there are four combinations:
- Those who do not drink nor smoke (1 such person)
- Those who are non-smokers but they drink (3 such people)
- Those who smoke but do not drink (4 such people)
- Those who BOTH smoke and drink (12 such people)
Think about it. Twelve people have HNC (head and neck cancer), and these 12 people both drink and they smoke (“double jeopardy”). Out of these 12 people, we find that three of them were non-smokers although they consume alcohol, so if we take out these three people out of the 12 people who are both drinkers and smokers, we are left with 9 out of 12 people in whom their HNCs could be attributed to their being smokers alone, or rather 9/12 = 75% of the cases of HNC among smokers and drinkers could be attributed to their habits of smoking. If we extend this same logic to the alcohol consumers (read along the lower row), we will see that out of 12 people who had HNC AND those who were BOTH smokers and drinkers, if we were to take out those four people who were NOT alcohol consumers but they smoked, then 8/12 = 67% people would be left whose HNCs could be attributed to their drinking alone. You may wonder what’s going on here: 75% of 12 people had their HNC because they were smokers and 67% of the people had HNC because they were alcohol consumers, and so if we were to add up the two numbers, we would get something like 142% of the cases of cancer, surely that’s not correct? The trick to think here is that, there were BOTH drinkers and smokers, and these risks compounded. So when we think of attributable risks, this provides us with one more way to think of causal mechanism.
Directed Acyclic Graphs
The third way to think about causal mechanisms and cause and effect in Environmental Epidemiology is to invoke a graphical way to think of cause and effect. Normally, when we think of cause and effect, we talk in terms of correlations rather than causation. We will never be able to “see” that X causes Y, but we will always be able to observe that there is an “association” or “correlation” between X and Y. If we were to think in terms of the language of graphs, consider the following:
The figure above shows you the graphical way to representing correlation. Here we see that X is correlated with Y, and the arrow is a double headed arrow. What we mean by this is that, you can either “travel” from X to Y, or you can move from Y to X, it does not matter. You can ascribe a value to the arrow, say “a” (below) and that is the correlation coefficient between X and Y, see below:
But this correlation can arise because of any of the following four reasons:
Let’s think about it. There are four possibilities that explains the correlation between X and Y
- Case A: where X is a cause of Y
- Case B: where Y is a cause of X
- Case C: where both X and Y are caused by a third variable Z and this is why we see that X and Y are correlated
- Case D: where both X and Y cause a third variable J or its offspring K and this is why we get to see X and Y are correlated.
This brings us to the concepts of what is referred to as directed acyclic graphs or DAGs. When we talk of directed acyclic graphs, we are talking about nodes and edges. In the figure above each of X and Y are referred to as “nodes”. Nodes connect to each other using edges, so all these single headed arrows are “edges”. In the language of DAGs, we do not have curved arrows, but we tend to think in terms of single straight arrows that connect one edge to another. A few rules to note here:
- An arrow can START in ANY DIRECTION from one node when it tries to reach another node
- Once an arrow starts in one direction (say moves from tail to head or head to tail of an arrow), it has to keep going in the same direction, it cannot reverse direction on the way
- If the arrow as it travels along the paths, meets with an opposite directed arrow, the arrow stops and we say that the path is closed
- Otherwise the path is open
If we follow these four rules in creating and working with arrows, it opens up for us a way of causal thinking. In the figure below, we have a few representaions of directed acyclic graphs where we are depicting how secondhand smoking or environmental tobacco smoke can be related to childhood asthma. Let’s take a look. We have drawn these arrows using a software tool known as “Dagitty” (dagitty is a free and open software, you can reproduce them here:
We open dagitty in the browser and draw the following graph. Note that in this case, we are modelling the exposure variable, that is, Environmental Tobacco Smoke (also referred to as “passive smoking”) and the outcome variable is referred to as “Childhood Asthma”. Now in the top panel, we have Poverty and two arrows that go from Poverty to ETS and Childhood Asthma.
Think of the top panel. If we were to follow the DAG path, we see that we can start from ETS, then move to Poverty, and then from Poverty to Childhood Asthma. This path is open as it is legitimate path. But what we want is that, we want only one path to be open and all other paths to be closed. Note again:
We want only one path to remain open and all other paths to remain closed. So if we want a causal path between ETS and Childhood Asthma, then we will need to close the other pink coloured path ETS -> Poverty -> Childhood Asthma. As long as that path is open, we cannot establish a cause and effect association between ETS and Childhood Asthma. From here, we state:
- For a cause and effect association, we only need one path to remain open, a direct path between X and Y
- All other alternative paths must remain closed, or if the paths are open, then they must remain closed, and if any path is already closed, then they must not be opened.
How do you close off a path? In Dagitty, you indicate that the variable that connects the alternative path (referred to as “back door path”), is “adjusted for”. This term has a special significance. Note that in the top panel, from the perspective of directed acyclic graph, “poverty” as a variable must be adjusted for to close the ETS-Poverty-Childhood Asthma path. In Epidemiology speak, we call this variable as a “confounding variable”. You can construct many confounding variables connecting your main exposure variable of interest and outcome variable of interest. Note that in order to qualify for it being a confounding variable, three things:
- Poverty is a direct cause of ETS (or poverty leads to ETS or passive smoking)
- Poverty is a direct cause of Childhood Asthma (or being poor or born into a poor household leads to having asthma)
- That ETS DOES NOT lead to Poverty, nor does Asthma lead to Poverty
These three features helps us to understand that in this case poverty is a confounding variable which must be adjusted for. After we adjust for the confounding variable (there can be any number of ways to adjust for the confounding variable, including randomisation, stratified analysis, restriction, or statistical adjustment), but once you do that, DAG marks that path in “black” colour to indicate that this path is closed.
Let’s take a look at another scenario.
What is happening in the above diagram?
Air Pollution is the exposure and Heart Disease is the outcome.
We also see that a direct causal path extends from Air Pollution to Heart Disease and this is the ONLY OPEN path that is lighted (in green colour)
We see a path that goes from Air Pollution to Cough (as air pollution does lead to cough) and another path goes from Cough to Chest Clinic Attendance. Note that the path “Air Pollution” -> Cough -> Chest Clinic attenance is a closed path. This path is closed because there is an opposite arrow at “Cough” where Air Pollution meets Cough coming from Heart Disease, so it cannot proceed any further and the path remains closed. Likewise, the path “Air Pollution -> cough -> Heart Disease is closed as well for the same reason.
So this is a “naturally” closed path that you should not open. But can you?
Let’s see what happens when we adjust for the variable “Cough”, we see the following picture:
This is what happens when you “open” a path that should be otherwise closed. You see now you have opened up a “backdoor path”, that is the path Air Pollution -Cough-Heart Disease that was otherwise closed. With cough adjusted for, you cannot anymore speak about a cause and effect association between Air Pollution and Heart Disease because you do not know whether it is due to the new open path or whether there is a direct association.
You may wonder as to how might one open a closed path and what are the consequences? Think about it. Air Pollution leads to coughing, and heart disease leads to coughing as well. So if you, while conducting your study on the association between Air Pollution and Heart Disease risk were to select on coughing patients, you would be adjusting this node, and in the process you would open up a closed path. But what happens if you adjusted not with the offending resulting variable, but let’s say the downstream variable, say clinic attendance?
Now you have not one, but two alternative open backdoor paths, “Air Pollution — Cough — Chest Clinic attendance and Heart Disease-Cough-Chest Clinic Attendance. Indeed, you may wonder as to how this might happen. This happens if you “sample” or “select” all your participants from the chest clinic where they were treating for Coughing patients, and you had your sample selected from that centre. This is an example of “bias” or “selection bias” that we have alluded to above. For a good discussion of paths and directed acyclic graphs and details, see the paper by Sander Greenland (1999) that he wrote with Judea Pearl and Jamie Robins [5], and if you are interested in DAGs in a clinical context, see Digitale [6].
Conclusion
So this provides you with a rough idea of examining causality in Environmental Health and Environmental Epidemiology from three different perspectives. We started with a discussion of what it means to have Environmental Epidemiology and how we might think in terms of cause and effect in observational studies. Then we discussed about the issues around chance, bias, and confounding variables and we learned that if we are able to rule out chance, if we are able to eliminate biases from our studies, and if we are able to control for confounding variables, then we will be able to assert that the observation that we get to see are real relationships. But finding that a relationship is statistically sound or robust does not tell us anything about whether this relationship is one of cause and effect. For that, we introduced the concept of causality from the perspective of criterion, and we learned about the principles and criteria that Sir Austin Bradford Hill provided. We call them Hill’s criteria but they should be rather referred to as Hill’s viewpoints. Then we learned about Rothman’s Pie and we learned about multiplicity of causes or causes interat with each other, and how many causes interact to produce their effects. Finally, we learned about directed acyclic graphs and how we might use directed acyclic graphs or DAGs to conceptually think and plan our studies and analyses.
This barely scratches the surface of what is cause and effect in health and particularly in Environmental Epidemiology. In a future article, I will discuss and we will learn about counterfactual theory of causation and the role of target trials. But till then, enjoy and post your comments and questions here.
— — — —
List of References
- Gold DR. Environmental tobacco smoke, indoor allergens, and childhood asthma. Environmental health perspectives. 2000 Aug;108(suppl 4):643–51.
- Evans D, Levison MJ, Feldman CH, et al. The impact of passive smoking on emergency room visits of urban children with asthma. Am Rev Respir Dis. 1987;135(3):567–572. doi:10.1164/arrd.1987.135.3.567
- Hill AB. The environment and disease: association or causation?. Journal of the Royal Society of Medicine. 2015 Jan;108(1):32–7.
- Rothman KJ, Greenland S. Causation and causal inference in epidemiology. American journal of public health. 2005 Jul;95(S1):S144–50.
- Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37–48.
- Digitale JC, Martin JN, Glymour MM. Tutorial on directed acyclic graphs. Journal of Clinical Epidemiology. 2022 Feb 1;142:264–7.