Cigarettes, job searching, and mental health; correlation or causation?

Miles Harden
Fall 2022 — Information Expositions
4 min readNov 30, 2022
Photo by Pawel Czerwinski on Unsplash

According to the CDC, smoking cigarettes causes about one in five deaths in the United States each year, which is around 500,000 people annually. Smoking has not only been proven to cause physical harm but can also create mental dependence due to its addictive qualities. Given the surplus of 2021 analytic data available, I aimed to discover if the raw percentage of adults who smoke correlates with total days of poor mental health, with or without being related to smoking. From this scoped-out analysis, I developed a question; is there any causation between total poor mental health days and the total number of adults smoking, or are there other major factors driving the smoking rate?

In order to perform this analysis, I utilized an analytic dataset that features a variety of variables throughout the year 2021. I also read in a detailed dataset regarding employment, unemployment, and workforce variables in case I couldn’t find any correlation within my goal. Once I had all of my data, I started my analysis by cleaning and organizing the data to display meaningful results across the nation.

To make my results meaningful, I decided to group up all data by state. The datasets I utilized feature data on many counties across each state, so to get a broader correlation, grouping by each state allowed me to look at the data on a nationwide scale. This process involved taking the mean of each statistic through all counties in each state and returning a clean dataset of 50 averages for each state and statistic. With my clean data frames created, I now began analyzing, plotting, and searching for correlations within the data I had.

I firstly calculated the correlation coefficient between the raw, state-by-state data for average adult smoker percentage and average poor mental health days, which returned me a value of 0.69. This is a moderately strong level of correlation; you can visually see the points plotted out in Figure 1 below. Visually, there is a positive, generally linear trend which correlates that on average, the more Americans who smoke, the more overall days of poor mental health. I genuinely see this as a causal relationship; the more who smoke and feel dependent on cigarettes, the more who experience the other effects of dependency, such as on days when they don’t have a cigarette or experience general addiction-caused stress. I could also argue for a causal relationship the other way around; the more Americans who experience shitty mental health days, the more Americans who want to smoke to “take an edge off”. However, I believe the former is a much more accurate causal relationship/explanation, which does answer one part of my initial question.

Figure 1: Average % of adults smoking vs average number of poor mental health days, sorted state by state

Beyond this likely causal relationship, I wanted to know if other major factors were influencing adults smoking cigarettes, as in this previous relationship it seems like smoking influenced the other variable. To do this, I utilized another dataset I previously mentioned relating to United States unemployment. Since the smoking data was from 2021, I utilized the most recent years of unemployment data to find the percent change in unemployment from 2019 to 2020. I also performed a similar cleaning method, grouping these percent changes to averages on a state-level scale. Similarly, I calculated the correlation coefficient between this average percent change in unemployment by state and the average percent of adults smoking in each state, which returned me a value of -0.39.

In this case, the overall correlation was weaker. However, there is a negative, generally linear correlation between the two variables. You can visually see this correlation in Figure 2 below. I liked to think about this relationship from a causal standpoint. It would make sense that the lower the percent change in unemployment gets and the more steady people feel day-to-day, the more people would buy cigarettes or start the habit due to their financial stability. It would be hard to argue for a causal relationship on the other side of things as we know that adults smoking does not cause unemployment percent change.

Figure 2: Average % of adults smoking versus average % change in unemployment (2019–2020), state by state

By comparing both unemployment percent change as well as poor mental health days to the average percent of adults smoking, it is apparent that smoking and poor days of mental health are closely correlated, with smoking likely being a large causal influence for poor mental health days. However, as we see in the unemployment correlation example, there is never a single cause for correlation existing. There are MANY factors that influence our nation’s cigarette-smoking members, many of which likely aren’t included in the available data, such as the money and power behind the tobacco industry itself. Overall, I can confidently answer my question and say that poor mental health days are very likely correlated with the total number of adults smoking, but there are also a variety of other factors which must be weighed into this influence.

--

--