Featured
Environmental Epidemiology Study Designs: an intuition based on Golden Circle
… start with why
In the previous edition of this series of posts, I have written about how in Environmental Epidemiology, we discuss cause and effect and measurements of exposure and outcome. This time we will discuss the third aspect of the Environmental Epidemiology, that of study designs.
Many years ago, business writer Simon Sinek wrote a book titled “Start with Why” where he showed that if you start with asking a question such as “why am I doing this?” to yourself whether you are a buisness or whether you are an individual or a leader, you will be able to influcence people to follow you and see your points of view. He argued that this approach makes sense as people tend to engage their limbic system (the “ancient” developed aspects of their brains and neural circuits), whereas the questions about “what is it that we are doing?” as your starter question will engage most of your clients’ or listeners etc in their neocortex, and is not likely to be that effcective. In between the sequence that Simon Sinek proposed was “why” at the core, “how” in between, and “what” on the outside (see Figure 1).
in a different context about the necessity of starting with why because the reason describes the motivation for research. We will use this framework for Epidemiological study designs as well and we will ask three questions:
Why do we measure?
As we focus on epidemiological study designs from the perspective of the motivation to measure, the need for measuring disease frequencies (rate) and causal effects (ratios) stem from epidemiologist observing facts and the pattern in the facts. Take for example, John Snow’s Cholera investigation. You can read the detailed description in his openly accessible book here (PDF) [1], but the part that is interesting in our case is his observation of the facts as follows (I have reproduced the table from his book);
You can see that this simple tabulation had drawn his attention to the problem, as he writes here:
Now you can see how his method of investigation would start with observation of facts in the rate of Cholera in various districts of London and from there, he would narrow down to the Southern districts. This led him to draw the map
So the study designs you would employ would start with your observation of the rates (prevalence and incidence), identification of patterns that you get to see, and then based on what you want to do, you will build the design or set the methods you’d like to employ. For example, if your aim is to find out the rates alone, then a careful review of the documentation or secondary data would be sufficient to provide you with that information. On the other hand, once you move beyond the patterns and start developing the hypotheses, then it is time for you to think what is it that you’d like to measure.
How to measure?
So the next thing we want to do is to set up a plan to measure what we want to measure, this is where the nitty gritty of study designs come to play. So essentially the How is the junction between why we’d like to measure certain things and what we will get at the end of the project. For example, always start with the “why”.
- Why are we conducting the research? Take the example of air pollution. If we want to study how many people are at risk of heart disease due to air pollution, then all we are planning to measure the prevalence of heart disease (“what” = prevalence of heart disease). How do we gather that data will be to count number of people with heart disease among a baseline population
- On the other hand, if the idea was to understand if air pollution was the cause of heart disease or air pollution is a causal factor for heart disease, then our “what” is a measure of association and the how then refers to a process that will assess the association between exposure to air pollution and heart disease
- So you can see that this combination of the why and what determines the “how” or the methods we use to obtain the “what” measure driven by the “why” aspect of the problem.
What to measure?
This is the second question from the heuristics of what do we want to measure. Here, you have the following choices. First, are you going to measure the rates of diseases (prevalence and incidence of the health conditions)? Or are you going to measure both the prevalence and incidence of exposure and the outcomes? This is also a time when you need to figure out clealy as to what is the outcome that you want to study? Here is a need to precisely define the health outcome that you’d like to study. Specifically, identify what will you include in defining the outcome and what will you not include in the outcome.
To justify why we need precise “measurements” of the outcome we want to measure consider this paper by Cascio (2017) where he summarised the health effects of people who were exposed to wildfire. You can understand that when people are exposed to wildfire and when wildfire contaminate air, the effects of various particulate matters that accumulate in the air will lead to shorter and longer term health effects. After summarising the health effects, Cascio leaves us with the question,
So, while much has been learned over the last decade and will be briefly summarized here, much is still unknown and further research is needed to better define the short-term and long-term impacts of wildfire emissions on health while being mindful of the ecological benefits of wildland fire. Such knowledge is critically important for policy development and decision-making vis-à-vis fuel management that includes prescribed fire, smoke forecasting (Yao et al., 2013, [he is referring to the following paper where Yao et al evaluated a wildfire smoke forecasting system to protect public health), and public health and clinical interventions intended to limit exposure to smoke and protect population health
So, you see that after first deciding why we need to measure anything, we need to identify what it is that we want to measure. Here is a brief table that walks you through the why and what of the measurement process that will lead us to the third consideration of “how to measure what to measure after answering why we want to measure that particular aspect".
Different Study Designs
As we wrote in the previous paragraphs, it is the combination of why — how — what that drives the sequence of study designs. We will now briefly describe each of the differnet study designs and will elaborate each one in a subsequent post.
Let’s start with the simplest situation, that of surveillance. Let’s say why we want to conduct a study is to tabulate and tally infectious diseases in the population. What we want to produce at the end of this process are incidence and prevalence estimates of the diseases. We set up a precise definition of the diseases we want to track and obtain data on the cases from various sources in the community. We have set up a surveillance system. Here’s an example of how epidemiologists in New Zealand conduct disease surveillance of influenza and respiratory illnesses:
You do realise that as in conducting a surveillance, we are only collecting cases and doing series of cases, and that is all that we are interested in. Our “why” was not about finding any cause-effect association. If any cause or effect linkage is discovered, that is incidental and again needs to be planned.
Suppose our goal (“why”) is to study if air pollution leads to asthma. Now, you see “what” we want to output is a ‘relationship’ between exposure to air pollutants and risk of asthma. The “how” in this case is to gather data on ambient air quality and asthma related hospitalisations. Indeed, check out this study,
https://www.sciencedirect.com/science/article/abs/pii/S0160412017304026
As the authors state their “why”
We aimed to quantify relationships between tree and green space density and asthma-related hospitalisations, and explore how these varied with exposure to background air pollution concentrations
In the process, see “what” did they ended up producing,
Green space and gardens were associated with reductions in asthma hospitalisation when pollutant exposures were lower but had no significant association when pollutant exposures were higher. In contrast, tree density was associated with reduced asthma hospitalisation when pollutant exposures were higher but had no significant association when pollutant exposures were lower.
Then the linkage between the two were forged thus,
Population standardised asthma hospitalisation rates (1997–2012) for 26,455 urban residential areas of England were merged with area-level data on vegetation and background air pollutant concentrations. We fitted negative binomial regression models using maximum likelihood estimation to obtain estimates of asthma-vegetation relationships at different levels of pollutant exposure.
You can see that they did not collect individual level data but data were collected at aggregated level. This is what we call as “ecological” study and the limitation of this study is that, you cannot obviously extrapolate findings from these studies to individual cases, because of many confounding variables.
What if we wanted to have individual level data and cause and effect studies? We now will explain the workings of cross-sectional surveys, case control and cohort studies.
In summary…
This was an intuitive introduction to the principles of study designs in Epidemiology, particularly environmental epidemiology. Start with why or ask yourself, “why am I doing this study? Why does X lead to Y?” That question will then lead you to explore what you will produce at the end of the trail and how to get there is your study design. We are leaving cross-sectional studies, case control studies and cohort studies till the next time.
References
- Snow J. On the mode of communication of cholera. InBritish Politics and the Environment in the Long Nineteenth Century 2023 Sep 29 (pp. 149–154). Routledge.
- Cascio WE. Wildland fire smoke and human health. Sci Total Environ. 2018 May 15;624:586–595. doi: 10.1016/j.scitotenv.2017.12.086. Epub 2017 Dec 27. PMID: 29272827; PMCID: PMC6697173.