DESIGNING A STUDY IN ENVIRONMENTAL HEALTH
PROLOGUE
Dr Dey was examining his patient’s hand intently. On that day, Ram Das, an agricultural labourer was his last referred patient in the clinic and he was referred to from the district medical officer with a puzzling lesion in his hands (see Figure).
This patient came to see him, like many others from a village directed by their doctors, about sixty kilometres away from Calcutta. The palms had dark raised lessons lesions and the doctor was intrigued.
He diagnosed that the patient was exposed to arsenic and manifested signs of arsenic toxicity but he was puzzled. “Where are you from?” he asked. “From Nishipur, doctor shaheb,” Ram Das told him.
Dr Dey knew about this village, where most people lived by agriculture, and people like Ram Das would not likely to work in copper smelters for there were none, and copper smelters were occupational sources of arsenic; he might be drinking inorganic arsenic contaminated water though, but he was not sure if or how that could be the source of arsenic. It could be also due to pesticides but he was not sure. So he asked him, “What do you do for a living? From where do you get your drinking water?” this should have been obvious to him from the notes his assistants wrote anyway, but he wanted to confirm. “I have a small plot of land that I farm sir; and we have a shallow tube well that we have dug from the community and everyone drinks water from the same well” Ram Das said.
This was puzzling. Dr Dey guessed that water from this tube well may have something to do with the skin disease that Ram reported with. But it might not be just one well. Ram was the fifth patient whom he saw this week with these painful lesions from different areas in South Gangetic Bengal from where Ram came, and these were all referred by doctors all over the district. The doctors were puzzled too; they knew that the most likely diagnosis of this was that the patients were exposed to arsenic from somewhere, but it was not entirely clear where. At which point, he remembered to ask, “Who else you know have these lesions? I mean, are there others with similar skin disease in your locality?” Ram told him that he know knew at least ten other people with very similar kind of skin disease and some of them have had their fingers and toes falling off; he knew others with cancer.
Dr Dey ordered a test for arsenic of the Ram’s urine sample and sent it to a laboratory in Calcutta. There results came back the following week. The results confirmed what he suspected all along: Ram had very high levels of arsenic in his urine. He and his fellow doctors were puzzled where it came from and what might be the story. Dr Dey called his friends at the Geological survey of India and planned a study. Perhaps the experts at the Geological survey group would know what was going on. The well water needed to be checked.
Professor Smith was tired that evening after his long research tour overseas; first in India where he had been to lead an investigation and subsequent publication of the report on arsenic toxin in West Bengal, and then, he was in Argentina investigating arsenic toxicity. The phone rang. It was a call from the Times from their New York office. They were wondering if the good professor could spend some time with them for an hour long interview on the Arsenic toxicity problem in India as it reached a fearsome proportion but also, some people were scared in Utah and California that they might end up drinking arsenic laced water from their ground water sources.
What shall we consider while setting up an environmental health study?
The term “study design” refers to the layout and planning of a study. Environmental epidemiology in the context of Environmental health refers to the environmental distribution and determinants of diseases and in particular, the source of the exposure or the health effects have would have an element of human activity or human engendered. Therefore, a carefully conducted epidemiological study is important in exploring environmental health issues. Here, we review issues that determine the study designs, methods, and applications.
Who, What, When, Where, Why, How
Every health research question has “who”, “what”, “when”, “where”, “how”, and “why” elements. The “who” element refers to humans. Who are affected? What are their age, gender distribution, socioeconomic factors? Can a pattern be identified? “When” and “where” indicate time and spatial distribution of the diseases. “How” and “why” are questions about the mechanism of disease or disease causation.
In turn, these questions provide directions for health research. Some health studies are purely descriptive, others are in search of analyses of data. For example, when the question is what is the extent of air pollution in the city of Christchurch, then just analyses of air pollutants collected over a range of different stations would be sufficient [1]. On the other hand, if the question is whether air pollution is associated with death among elderly, then another type of study is warranted. For example, Sadiva and colleagues used a time series analysis in Sao Pauolo, Brazil to study the linkage between air quality and deaths in elderly individuals by linking data on deaths from the different wards of the city and the air quality data from the monitoring stations [2]
Case Series
These could be descriptive epidemiological studies (case reports, case series), or analytical epidemiological studies (some ecological studies, cross sectional surveys, case control studies, and cohort studies). While case studies or cross sectional surveys are well suited for description of health conditions, based on the results of these studies, scientists can come up with hypothesis generating questions.
Disease surveillance, for example are based on case series. For example, continuous monitoring of air quality at a place can be an example of a an exposure surveillance. In New Zealand, the Environmental Science and Research (ESR) routinely conduct environmental and disease surveillance for a range of diseases in the country and posts them to public domain (identify from their website and provide a link). In the United States, the agency of Centres of disease control and prevention weekly publishes the Morbidity and Mortality weekly reports that provide results of surveillance for diseases worldwide.
Case series enable health researchers to frame an answerable question or rival hypotheses which can then be investigated using analytical study designs. Hypotheses are derived from “theories” or theories that explain phenomena. For example, in our story, Dr Dey was perplexed by the skin lesions he saw and he set up hypotheses that his patients may have had access to drinking water that contained arsenic in high concentrations, or they may have had exposure to high concentration of arsenic from some sources and these in turn metabolised in the body and would appear in urine. Accordingly he ordered urine tests for presence of arsenic. When the reports turned out to be positive, he was certain that it was their chronic exposure to arsenic that was responsible for the cases. However, it would still be necessary to conduct epidemiological studies to establish that in the population, that indeed was the case.
In addition, case series in environmental epidemiological studies can also be used to test hypotheses. Some case series methods for instance, the case crossover study designs and the self-control case series study design methods can be used to model single cases that occur over a time period to find out relationship between environmental variables such as extreme temperature and deaths, or hospital admissions or as Heather Whitaker wrote about certain strains of MMR (measles, mumps, rubella) vaccine usage and risk of aseptic meningitis [3, 4]
Ecological Study (Time series)
Ecological study designs are those study designs where aggregated data are obtained for both exposure and outcomes, and these data are then analysed together to test the hypotheses that exposure and outcomes are related to each other or are linked. For example, in studies of air pollution and health effects, air quality data are routinely collected from different stations throughout a city and from the same city blocks, hospital admissions data on certain health outcomes (such as total death, admissions due to heart diseases, or admissions due to asthma) are collected and these two entities are then analysed together. For example, in Sadiva and Dockery’s study in Brazil, they obtained data from Sao Paolo municipal authority on deaths and air quality data from 12 monitoring stations [5].
Sadiva and Dockery found in their study that for each 100 ug/L of increment in PM10 levels, the risk of deaths in the elderly go up by about 8%. Does this automatically mean that on a bad air day, the chance of an individual elderly to die was 8% ? The answer is “no”, because there may be other factors common to both poor air quality of air (or high air pollutant concentration) on a particular day and risk of death on the same day. To construct a fictitious example, say the city had fireworks display on a Sunday (holiday) and the outpatients department which refer patients for admission was also closed. On Monday, the air quality of the city would be bad because of accumulation of particulate matter from the fireworks display and also Monday being a working day, and the outpatients being open, would see a higher inflow of patients and possibly higher admission and death rates than when the hospital was closed for the weekend. Therefore, it would be wrong to claim that because there is a general agreement that following those days when the air quality is poor, hospital admission rates due to heart disease also go up for an individual as if no other factors can explain for this phenomenon. This error in judgment or fallacy is referred to as “Ecological Fallacy”, that is, based on ecological study results, one cannot generalise to individuals.
Cross sectional survey
A cross sectional sure survey is designed to generate a snapshot of a health problem in a community. This is both useful for some levels of hypothesis testing but also for estimating prevalence of a health outcome. For example, Professor D N Guha Mazumder et al (1998) conducted a large cross sectional survey of 7683 people in a the North 24-Parganas district in West Bengal state of India to study the association between arsenic exposure and skin lesions [6]. While cross sectional surveys are useful study designs, they are not the best designs to understand cause and effect linkages. The reason is this:
1.Cross sectional surveys are open to recall bias from respondents.
2. In case cause and effect assessments, causes should precede health effects. In cross sectional surveys, it is impossible to be sure if the health outcomes actually preceded the exposure or whether they arose at the same time as this information is collected at the same time as collection of data on health outcomes.
Case Control Study
In a case control study, participants in the study are sampled on the basis of whether they have the disease in question. Those who have the disease are referred to as cases, and those who do not have the disease are referred to as controls. Both cases and controls are then assessed for the likelihood of their exposures. For example, Haque et al (2003) also reported a case control study in the same population where the cross sectional survey of arsenic toxicity was conducted. In that case control study, Haque et al (2003) studied 192 persons with skin lesions and 213 individuals without skin lesions and they followed up the these people for and sampled their water drinking water samples and then studied the association between various levels of arsenic in drinking water and the risk of skin lesions [7].
In case control studies, the likelihoods or odds of exposure are compared for cases and controls. Therefore, the effect measure is referred to as Odds Ratio (alternatively Likelihood Ratio). Refer to the following table. This table presents a fictitious example of a case control study. In this case control study, 100 cases and controls were asked about their exposure to “Exposure” and the investigators ended up with a table as follows:
As can be seen from this table, 70 out of 100 cases, and 30 out of 100 controls tested positive for exposure. Thus, the odds of exposure among the cases was 70:30, and the odds of exposure among the controls was 30:70. Hence the Odds Ratio is 70 * 70 / (30 * 30) = 49/9, or approximately 5.4. If we were to replace the figures in the the above table with A, B, C, and D and reconstitute the table, the table who would show something like this:
The Odds Ratio would then be estimated as OR = (A * D) / (B * C). This is also known as “cross product ratio” for finding out the odds ratio from one study or for one set of findings.
In case control study, it is possible to control for the effects of potential confounding variables. This can be achieved in three ways —
Cases and controls can be matched on variables that are thought to be potential confounders. For example, Haque et al (2003) in their case control study matched their cases and controls on the basis of their ages (within five years) and gender.
Stratified Analysis
To illustrate this, two tables are set up as follows, one for men and one for women in the above fictitious study we used for the case control study illustration.
In this fictitious example, the Odds Ratio for men was 10.0, while that for the women was 3.0. While in both groups, there was an association between the case control status and then exposure and in the same direction, the magnitudes were very different, and when something like this happens, it provides an indication that a confounding by that “variable” has occurred; thus, in this case control study, you may conclude that confounding by gender has occurred. Therefore while the crude OR expressed above 5.4 indicates that in general, without adjusting for the effect of genders, this may be the extent of association but it is not accurate as it does not adjust or control for the effect that men and women have different effects. Hence, the adjustment is done as follows:
(50 * 40 + 20 * 30) / (20 * 10 + 20 *10) = 2600 / 400 = 6.5, or the Odds Ratio was 6.5.
Note that this Odds Ratio is between 10 and 3 and is more than but not too far away from the crude OR of 5.4. As before, the algebraic equation for this situation is as follows:
The Pooled Odds Ratio = OR(mh) = (A1 * D1 + A2 * D2) / (B1 * C1 + B2 * C2)
Multivariate Analysis
Logistic regression is the analysis of choice in case control studies. A detailed description of the theory and practice of logistic regression is beyond the scope here, so a brief description of the principles is given. In logistic regression analysis, the logit function of an outcome is modelled on the variables. A logit function is essentially a logarithm of the odds of an event. For example, let’s say we found out of 100 cases, 70 individuals had the exposure that we wanted to study. Expressed in logit, it would be log(70/30). Usually, natural logarithms are used for this analysis. The logit function is then regressed in a linear model on the exposure and confounding variables. The simplest model looks like so:
logit(Y) = alpha + beta * X
Where alpha is the intercept and beta is the beta regression coefficient for the exposure variable X. X can be a binary variable (taking the value of 0 and 1, or X can be a continuous variable, or X can be an ordinal variable. More on the variables in the data analysis section).
Lets say X is a binary variable and has a value of 1, or 0 where 1=exposure to the environmental agent, and 0=non-exposure to the environmental agent. Then, according to this equation, when X is set to 1,
Logit Y for exposure = alpha + beta*(X = 1) = alpha + beta … (1)
When X is set to 0 (that is non-exposure), then,
Logit Y for non-exposure = alpha + beta*(X = 0) = alpha + 0 … (2)
If we deduct (2) from (1), we have,
Logit Y for exposure — Logit Y for nonexposure = beta … (3)
As logit is logarithm, and as is the rule of logarithms that when one logarithm is deducted from another, they are actually dividing each other, equation (3) is actually
Log (Odds of Y for exposure/ Odds of Y for non-exposure) = log(Odds Ratio) = beta .. (4)
Therefore Odds Ratio = exponential (beta) … (5), we raise 2.713 (that is the constant e), to beta.
Note that because this is a linear model, we can add many variables to it. This will be explored in the data analysis section. In a multivariate logistic regression, many variables can be added to the equation. For example, Haque et al (2003) conducted a multivariate logistic regression to test the association of arsenic exposure to case control status, the cases being those with skin lesions and controls being those who did not have skin lesions.
Two known disadvantages of case control studies are that they are retrospective, and subject to recall bias. In a case control study, because data are collected on the basis of identifying individuals with outcomes, exposure data are collected retrospectively; as a result, this study design cannot control for the time, that is, it cannot be ensured that the exposure preceded the outcome. However, that said, in case control study designs, it is possible to study multiple exposure variables or for the same outcome measure. It is also possible to study rare diseases, as the sampling of individuals are done on the basis of their outcomes. Rather than waiting for the outcomes to occur for a common set of exposures, it is possible to actually start with an outcome and then sample individuals on that basis and study the possible occurrence of exposure in among case cases and controls.
Retrospective Cohort Study
In a cohort study, cohorts are assembled and then they are followed up in time. Cohorts are similar groups of individuals, in this case those who are and who are not exposed to an exposure variable of interest. This can be done retrospectively using historical data as well as done prospectively. When the cohorts are assembled in historic time and then they are also “followed up” in historic time (that is in time that has preceded the time of inquiry of or time of conduct of the research), this type of cohort study is referred to as “retrospective cohort study”. Retrospective cohort studies are frequently conducted in workspace settings and particularly useful in occupational epidemiological study designs. For example, in 1980, Bengt Sjogren and colleagues reported the results of a study on welders who welded stainless steel and therefore were exposed to hexavalent Chromium. As hexavalent chromium was a known cancer causing agent in animal studies, Sjogren and colleagues wanted to study what would happen to workers who were exposed to high concentrations of Chromium occupationally. For this, they obtained data on welders in Sweden who were exposed between 1950–1965 and followed their health records till 1977 [8]. They found that while the standardised mortality ratio for other cancers were similar for the welders and general members of the public, welders had higher risk for lung cancers.
Prospective Cohort Study
In a prospective cohort study, cohorts of participants are assembled before the commencement of the study. The cohorts of participants are assembled on the basis of whether they are exposed or not exposed to the environmental factors of interest. Also, at the beginning of the study the members of the cohort must be free of the outcome of interest. For instance, imagine that a cohort study is being conducted to test the theory that workplace induced noise leads to hypertension in the employees. The hypothesis being tested is exposure to noise in a particular factory shop floor workplace leads to development of hypertension after five years of working there compared with non-exposure to noise while working in the same factory. To test this hypothesis, employees can be assembled into two cohorts: both cohorts should be free from hypertension to start with; one cohort group members are exposed to constant ambient noise in their place of work and the other cohort could be selected from office desk jobs in the same factory but those who are removed from the factory shop floor noise. After this, the cohorts of participants are periodically examined for the signs of developing hypertension and are compared. The effect measure for a cohort study is the Relative Risk estimate where incidence rates of the disease are compared for exposed and non-exposed. Cohort studies, specifically prospective cohort studies, are advantageous in the sense that a number of different diseases can be studied that can arise out of the same source of exposure. For instance, exposure to noise as a stressor can lead to other stress related diseases such as diabetes or premature balding or depression. A second advantage of prospective cohort study is the ability of nesting other case control studies which is described next. But cohort studies are also expensive and time consuming.
Nested Case Control Study
A nested case control study is “embedded” within a prospective cohort study. At the beginning of the main study, potentially useful exposure data such as blood samples are collected from every member of each cohort. Then, after a certain period of time, when number of individuals show specific health health effects or health outcomes of interest accumulate, then a case control study is conducted, based on exposure data collected in the beginning of the study. This study design overcomes the disadvantage of response bias that can occur in a regular case control study.
Epilogue
This was a brief introduction to the main principles of study designs in environmental health. Epidemiological study designs are very important for establishment of an association between an ex environmental agent and a health outcome.
In the arsenic to toxicity studies, after the initial observations that led to the discovery that people who lived in the Gangetic delta in both India and Bangladesh, were exposed to high arsenic concentrations in their drinking water. The source of this drinking water was from shallow tube wells that were dug to obtain ground water for irrigation but also was used for drinking. This led to identification of hundreds of millions of people who were exposed to high concentrations of inorganic arsenic in their drinking water by the epidemiologists, geologists and environmental health experts.
References
- Wilson, J. G., Kingham, S., & Sturman, A. P. (2006). Intraurban variations of PM10 air pollution in christchurch, new zealand: Implications for epidemiological studies. Science of the Total Environment, 367(2–3), 559–572. doi:10.1016/j.scitotenv.2005.08.045 ↩
- Sadiva, P. S., & Dockery, D. D. (n.d.). Air pollution and mortality in elderly people: A time series study in sao pauolo, brazil. Archives of Environmental Health, 50(2), 159–163. ↩
- Nitschke, M., Tucker, G. R., Hansen, A. L., Williams, S., Zhang, Y., & Bi, P. (2011). Impact of two recent extreme heat episodes on morbidity and mortality in adelaide, south australia: A case-series analysis. Environmental Health, 10(1), 42. doi:10.1186/1476–069x-10–42 ↩
- Whitaker, H. W., Farington, C. P. F., Spissens, B. S., & Musonda, P. M. (2005). The self-controlled case series method. Statistics in Medicine, 0, 1–31. ↩
- Sadiva, P. S., & Dockery, D. D. (n.d.). Air pollution and mortality in elderly people: A time series study in sao pauolo, brazil. Archives of Environmental Health, 50(2), 159–163. ↩
- Guha Mazumder, D. N. G. M. (1998). Arsenic levels in drinking water and the prevalence of skin lesions in west bengal. Int. J. Epidemiol, 27, 871–877. ↩
- Haque, R. H., Guha Mazumder, D. N. G. M., & Smith, A. S. (2003). Arsenic in drinking water and skin lesions: Dose-response data from west bengal, india. Epidemiology (Cambridge, Mass.), 14, 174–182. ↩
- Sjögren, B. (1980). A retrospective cohort study of mortality among stainless steel welders. Scandinavian Journal of Work, Environment & Health, 6(3), 197–200. doi:10.5271/sjweh.261 ↩