COVID-19 Overview in the United States

Published in

The Book Aisle

38 min readDec 9, 2020

This is the fourth and final part in my series on the pandemic’s impact on the United States. The last three parts were top ten lists that highlighted different aspects of the pandemic, ranging from infections, to deaths, to unemployment. These lists took a look at some of the most extreme outcomes in the hardest hit places. Now, it’s time to bring all of this full circle. Rather than just take a tour of the country, we’ll be taking more of a bird’s eye view of the pandemic and the factors that contribute to it.

First, though, to wrap up loose ends from the previous three parts, here are the top ten states for each of the metrics examined.

Top 10 States with Highest COVID Incidence

10) Florida (3.8 percent)

9) Nebraska (3.8 percent)

8) Tennessee (3.8 percent)

7) Louisiana (4 percent)

6) Alabama (4 percent)

5) Mississippi (4.1 percent)

4) Wisconsin (4.1 percent)

3) Iowa (4.3 percent)

2) South Dakota (5.5 percent)

1) North Dakota (6.2 percent)

Similar to what we saw on the top ten counties list, the South and Midwest are the hotspots for COVID-19 cases. Four of these states are represented on the top ten counties list.

Top 10 States with Highest COVID Fatality

10) Louisiana (3.1 percent)

9) District of Columbia (3.7 percent)

8) Michigan (3.8 percent)

7) Pennsylvania (4 percent)

6) Rhode Island (4.1 percent)

5) New Hampshire (4.2 percent)

4) Connecticut (6.2 percent)

3) Massachusetts (6.4 percent)

2) New York (6.5 percent)

1) New Jersey (6.8 percent)

As expected, most of the states on here are in the Northeast, similar to the top ten counties list; however, only three states are actually represented on that other list.

Top 10 States with Highest March-April Unemployment Increase

10) New Jersey (12.2 percent)

9) Illinois (12.6 percent)

8) Rhode Island (12.9 percent)

7) Massachusetts (13.1 percent)

6) Vermont (13.4 percent)

5) Indiana (13.8 percent)

4) New Hampshire (14.5 percent)

3) Michigan (19.6 percent)

2) Hawaii (21.4 percent)

1) Nevada (23.2 percent)

Once again, the top ten states reflects the geographic breakdown of the top ten counties. All five states that were represented on the top ten county list made this list.

So on the whole, there aren’t too many surprises when we look at the hardest hit states for each metric. While the counties that made each of the lists are outliers compared to the rest of the country, the level of severity they face isn’t far off from the rest of their corresponding state. Even so, this exercise further emphasizes how the “hotspots” of the pandemic can be defined in multiple ways.

Economic Reopening Plans

One topic that I didn’t discuss in the previous parts is the various economic reopening plans implemented by the states since the pandemic began. Aside from the CARES Act and various guidelines issued by the CDC and other federal agencies, most of the pandemic response has fallen on the states. Given the differences in the political climate and the pandemic’s impact across states, there is substantial variation in the nature of these reopening plans and their effects on the metrics discussed in earlier parts.

While I didn’t find much use for them in the first three parts, I plan to incorporate the differences in these state plans for the analysis in this part. To do this, I will be using data on the reopening plans collected by MultiState, a public policy consulting firm.

MultiState assigns an “openness” score for each state based on eleven factors pertaining their pandemic response. These factors include whether residents are under a stay-at-home order, how broadly the state defines “essential businesses”, the extent to which various non-essential businesses may operate, whether the reopening plan is statewide or limited to certain regions, and whether local governments may issue stricter guidelines than the state. The score is on a scale from 0 (least open) to 100 (most open). For the purposes of this exercise, I will be using their ratings from October 6, about one month before the election.

I’ve provided the ten states with the most open protocols implemented. And as a general note for this article and my 2020 election analysis, I am excluding the state of Alaska as county-level (or boroughs, as they are called there) election data is unavailable.

Top 10 Most Open States

10) Kansas (86)

9) Indiana (86)

8) South Carolina (88)

7) Missouri (90)

6) Iowa (90)

5) Idaho (90)

4) Oklahoma (93)

3) Nebraska (93)

2) South Dakota (96)

1) Florida (96)

COVID Incidence

So what drives the level of COVID infections, or the incidence rate, at the county level? Based on the ten counties discussed in part one, the most impacted counties can be characterized as rural, solidly Republican (with a few exceptions), mostly white but also having sizable pockets of blacks or Native Americans, generally poor with few college graduates, and located in the South and the Midwest. But to what extent do these characteristics drive the infection rate? Which factors are more indicative than others? Which factors can be dismissed as coincidental?

To answer these questions, we will be building an OLS regression model using R. But first, we’ll discuss the variables in more detail.

First, there’s the pre-existing political leaning of the county. For this, we can choose between the county’s Trump vote share and its Clinton vote share. One would think that counties with more Trump support are less likely to take the medical side of the pandemic seriously. This in turn contributes to more relaxed social distancing and mask wearing, which could lead to higher COVID incidence. Despite this observation, first glances at the data don’t seem to support this. There’s very little correlation for either Clinton vote share (r=0.024) or Trump vote share (r= -0.007) for total incidence. However, when controlling for time, these vote shares move in opposite as the pandemic progresses. In the early months of the pandemic, the Clinton vote share (r=0.242) is positively correlated with incidence while the Trump vote share (r= -0.215) is negatively correlated. In the summer, each of these variables converge as infection hotspots migrate from the more Democratic friendly Northeast to the more Republican friendly South and Midwest (r=0.137 for Clinton vote share, r= -0.089 for Trump vote share). And by the fall, the Clinton vote share (r= -0.169) has become negatively correlated with COVID incidence while the Trump vote share (r=0.146) has become positively correlated. Considering this, there doesn’t seem to be an advantage between selecting one candidate’s vote share or another when looking at total incidence. For the purpose of this regression, I’ll use each of them in separate models.

Next, there are several demographic factors to consider. I’ll be using several variables for race/ethnicity that are correlated with total incidence, such as white (r= -0.226), black (r=0.202), Native American (r=0.137), and Hispanic (r=0.084). There will also be two age variables: Young and Senior. For this analysis, I will be classifying Young as the percentage of the county’s population between the ages of 20 and 29 and I will be classifying Senior as the percentage of the population over the age of 65. While the virus is often associated for its heightened threat among older subjects, the data indicates that the share of senior residents in the population (r= -0.203) is negatively correlated with COVID incidence while the share of young adults (r=0.154) is positively correlated. This observation is reinforced by the CDC, which reported that while the virus initially affected older people in the early months, by the summer, younger people were accounting for more infections and had become key drivers for transmission.

Other demographic variables include educational attainment, particularly the share of the adult population with less than a high school diploma (r=0.219) and the share with at least a bachelor’s degree (r= -0.127). As expected, educational attainment is negatively correlated with COVID incidence, as those with higher attainment are more likely to be in jobs that can be done remotely, thus limiting face-to-face contact with others. And the other two variables I’ll consider are the share of veterans in the population (r= -0.19) and the share of people with a disability (r= -0.05).

Next, I’ll be using two socioeconomic variables. One is median household income, which is negatively correlated with incidence (r= -0.136). For the purposes of this analysis, I have standardized income values by dividing them by the national average. That way, the regression won’t be measuring individual dollar changes across counties, which would yield a trivially small coefficient value. And the other variable is the poverty rate, which is positively correlated with incidence (r=0.205).

And finally, I’ll be looking at the reopening plan of the county’s home state. Because these reopening plans are developed at the state level, all counties within a given state will receive the same values for the variables in this category. The two variables I’m most interested in are the state’s overall openness score (r=0.213) and its level of local preemption (r=0.112). As expected, states with a higher openness score (i.e. less restrictions on businesses operating) are positively correlated with COVID incidence, suggesting that these relaxed provisions allow more opportunities for transmission to take hold.

The result of all these variables produces the following OLS regression equation:

Incidence= Intercept + 2016Trumpvotex1 + Whitex2 + Blackx3 + NativeAmericanx4 + Hispanicx5 + Youngx6 +Seniorx7 + Incomex8 + Povertyx9 + LessHSx10 + CollegeGradx11 + OpenScorex12 + LocalPreemptionx13 + Religionx14 +Veteranx15 + Disabilityx16 + Error

Now that we’ve established the variables, let’s run the regression. The coefficients can be found in Figure 1 below.

Several findings emerge from this exercise. One is the difference in effects between the Clinton and Trump vote shares. The Trump vote share (B=0.01) is positively associated with COVID incidence, meaning that counties with a higher 2016 Trump vote share are expected to have more cases relative to their population. Conversely, the Clinton vote share (B= -0.014) is negatively associated, meaning that counties with a higher 2016 Clinton vote share are expected to have less cases relative to their population. Both of these coefficients are statistically significant at 95 percent confidence. This finding is consistent with what we observed in part one, where most of the counties with the highest COVID incidence went for Trump by substantial margins in 2016.

Regarding race and ethnicity, non-white groups and Hispanics are powerful indicators of COVID incidence. Counties with large shares of black (B=0.046 in Figure 1; 0.052 in Figure 2) and Native American (B=0.062; 0.069) populations are especially vulnerable for infection, as are counties with large shares of Hispanic (B=0.016; 0.026). Each of these coefficients are statistically significant at 95 percent confidence. White population (B=0.022; 0.023) is also associated with higher infection; however, this coefficient is smaller than those for black and Native American populations and is only significant at 90 percent confidence.

Next, there’s age, which has some interesting findings. As expected from the correlation, there’s a positive, statistically significant relationship between the share of residents aged 20–29 (B=0.063; 0.053) and COVID incidence, suggesting that young adults are more susceptible to getting infected and possibly transmitting it to others within their cohort. Conversely, while there’s a negative correlation between the share of senior residents and COVID incidence, this doesn’t translate to a statistically significant relationship. Because we cannot reject the null hypothesis (that there’s no relationship at all), we cannot draw many conclusions about how the share of senior citizens affects the incidence rate.

Educational attainment also produces some interesting findings. The share of adults with less than a high school diploma (B=0.041; 0.02) is positively associated with COVID incidence; however, its statistical significance depends on which variables we’re controlling for. It’s statistically significant in Figure 1 (which controls for the Trump vote share), but not significant in Figure 2 (which controls for the Clinton vote share). Conversely, the share of adults with at least a bachelor’s degree (B= -0.019; -0.02) is negatively associated with the incidence rate and is statistically significant in both models. Overall, the evidence is mixed; however, it does indicate that counties with higher educational attainment are expected to have fewer cases relative to their population.

Next, based on these models, it doesn’t appear that socioeconomic factors, such as median household income or poverty rate, have much effect on the COVID incidence rate; however, much of this has to do with the specific variables included in the models. I experimented with multiple OLS models, each with different specifications, and I found that under most models, the poverty rate is a statistically significant indicator of COVID incidence while household income remains a fairly weak indicator. On its own, the poverty rate (B=0.079) is positively associated with incidence, and this statistically significant relationship holds when controlling for household income and most other variables, including educational attainment. But ultimately, it’s when all these other variables are included in the model simultaneously, as it’s done above, where the relationship falls apart. This suggests that the combination of race, age, and educational attainment explains most of the variation in COVID incidence that would be attributed to differences in the poverty rate.

Finally, there’s the state reopening plans. Now what’s interesting about this category is that while the state’s overall openness score (B=0.032; 0.034) is statistically significant and positively associated with COVID incidence, this relationship changes when we add additional specifications to the model. Below, I’ve included a regression model that only includes the variables pertaining to state reopening plans.

As you can see, the overall openness score becomes less indicative once we control for these other variables, which are correlated with the score. But we also see that certain aspects of the reopening plans have different effects on COVID incidence. For example, having fewer restrictions on restaurants and retail establishments does not increase the incidence rate. Conversely, having fewer restrictions on personal care facilities, gyms, bars, and large venues does increase the incidence rate. Given the nature of these different establishments in terms of the ability to socially distance, the ability to avoid physical contact, and the ability of air particles to bounce around, these differences in COVID incidence shouldn’t be too surprising. Furthermore, states that employ a more regional approach also experience higher COVID incidence as some parts of the state will likely have more lenient restrictions than others, which not only results in more infections, but can also undermine containment efforts in other parts of the state that employ tighter restrictions. But what’s interesting is that even if we incorporate all of these variables in with the rest of the model, most of them retain their respective relationships with COVID incidence along with their statistical significance, indicating that these relationships hold even when controlling for other variables.

One other question to answer is whether these relationships for different variables hold over time or whether there are seasonal effects. To answer this question, I ran these regression models several additional times, using the seasonal COVID incidence rates as dependent variables. I will be using the same three time periods as I did throughout the rest of this series: early months (before May 31), summer months (June 1 through August 31), and the early fall months (September 1 through November 3).

First, let’s look at the early months

During the early months of the pandemic, the political leaning of counties have the opposite relationship as the overall model. In this case, counties with a higher share of Clinton support (B=0.004) have higher COVID incidence while those that lean towards Donald Trump (B= -0.004) have lower COVID incidence. This makes sense, given that in the early months, the pandemic mostly affected the Northeast and major cities, which tend to lean Democratic.

Other demographic variables, such as race and age, have a much more modest effect during this time period as few of their coefficients achieve 95 percent confidence. One relationship that holds during this time period is educational attainment, where COVID incidence is higher among those without a high school diploma (B=0.02; 0.019) and lower among those with at least a bachelor’s degree (B= -0.007; -0.007). This is interesting, given that the Northeast and other areas affected in the early months tend to have a higher concentration of college graduates than the rest of the country. One reason why the infection rate may be lower for college graduates is, as I mentioned earlier, that college graduates are more likely to be in jobs that can be done remotely.

And for socioeconomic variables, median household income (B=0.006; 0.006) has a statistically significant positive effect on COVID incidence. Similar to political leaning, this effect is largely due to the geographic concentration of infections on the Northeast and major cities, where income tends to be higher than the South and Midwest, which are largely spared during these early months. The poverty rate, however, has a much more modest effect on the incidence rate.

Overall, the early period of the pandemic is driven largely by demographic factors that are more prevalent in the areas most heavily affected by the virus. In this case, we observe that the political split in COVID incidence involves Democratic counties experiencing more infections, as well as more affluent counties, which are disproportionately located in the Northeast, West Coast, and major cities affected by the pandemic early on.

By the summer, we notice that the relationship between COVID incidence and voting behavior reflects the total. In this case, as there are more breakouts in the rural South and Midwest, we see that the Trump vote share (B=0.012) becomes attributed to higher COVID incidence while the opposite is true for the Clinton vote share (B= -0.013).

We also start to see a stronger relationship between race/ethnicity and COVID incidence. In this case, counties with a higher share of black residents (B=0.034; 0.035) as well as those with a higher share of Hispanic residents (B=0.019; 0.019) experience more COVID cases relative to their population. On the other hand, age doesn’t have much of an effect on COVID incidence, or at least not at the county level. And as for educational attainment, we still see a strong, positive relationship for adults with less than a high school diploma (B=0.035; 0.037); however, there doesn’t seem to be much of a relationship for adults with at least a bachelor’s degree. Even so, the general relationship holds that counties with higher educational attainment will have a lower incidence rate than those with lower educational attainment.

Finally, there doesn’t seem to be much of a relationship between income or poverty and COVID incidence during the summer months.

Overall, the summer sees a large spike in COVID cases in the South and Midwest as public health guidelines are loosened. The trends observed during the summer months are largely a microcosm of the overall trend during the entire pandemic up until Election Day, particularly with incidence being higher in counties that supported Donald Trump, counties that are heavily black or Hispanic, and counties with low educational attainment.

During the fall months, there doesn’t seem to be much of a relationship with Trump or Clinton vote shares. Similarly, there doesn’t seem to be much of a relationship with the share of black or Hispanic residents. One reason for this is that during the early fall months, infection doesn’t seem to be heavily concentrated in one particular region. Rather, there appears to be a more precipitous rise in cases everywhere.

On the other hand, the major difference between the fall months and the rest of the time periods is age. Particularly, the share of young adults (B=0.039; 0.039) is strongly associated with higher COVID incidence. One reason for this could be the start of the academic year at universities, where many campuses allowed students to move into the dormitories. And while these universities have taken measures to prevent outbreaks, there have still been outbreaks on some campuses.

Overall, there isn’t too much to report on the fall months that hasn’t already been discussed, aside from the uptick in cases for young adults.

COVID Fatality

Next, we’ll analyze factors that contribute to the COVID fatality rate. I’ll be using the same model specifications for the first part, with the inclusion of one additional variable: the COVID incidence rate. Naturally, it wouldn’t make much sense to include the fatality rate when examining the incidence rate as we’d be suggesting that more deaths contribute to more cases; however, it would make more sense to find a relationship in how the COVID case load affects the fatality rate.

After all, one of the justifications for more stringent public health guidelines, such as stay-at-home orders, is that widespread infection would overwhelm hospitals and healthcare facilities. Resources would become overextended, leading to fewer patients receiving optimal treatment (or any treatment), meaning that more patients that otherwise would survive the virus would instead die from it. This rationale suggests that a higher incidence rate would also lead to a higher fatality rate.

But as we saw in parts one and two of this series, the counties with the highest incidence rates tended to have a below average fatality rate. And conversely, the counties with the highest fatality rates weren’t exactly major hotspots for infections (or at least not after the early months of the pandemic). Are these observations merely coincidence, or do we also see this relationship playing out across the rest of the country?

The OLS regression tables are shown for Figure 2 below.

As expected, the COVID incidence rate (B= -0.063; -0.061) is negatively associated with the fatality rate. This relationship is statistically significant at 95 percent confidence, meaning we can reject the null that the incidence rate doesn’t affect the fatality rate. This finding appears to contradict the assertion that higher COVID incidence would increase the fatality rate by overwhelming existing medical care infrastructure; however, there are several explanations for this. One is that in the months since the pandemic started, many places have improved their ability to conduct rapid testing and implement treatment for those with more severe cases. Hospitals have largely expanded their capacity to handle COVID cases (especially more severe cases), meaning that they can handle more cases without becoming overwhelmed. As a result, there are fewer fatalities stemming from the inability to offer at least some treatment.

But I think another critical reason is demographics. Age is a major one. While counties with a higher share of senior citizens do not experience an increase in COVID cases relative to population, these counties are highly vulnerable to fatalities among those that do get infected. Indeed, counties with more senior citizens (B=0.075; 0.072) tend to experience a higher fatality rate. This finding is backed up by the CDC, which notes that the risk of death from COVID increases significantly with age.

Other demographic factors that are statistically significant include the share of black residents (B=0.036; 0.035) and the share of Hispanic residents (0.009; 0.008). These trends are notable, as they reflect existing research indicating that blacks and Hispanics are not only at higher risk of getting infected, but also at higher risk of dying from the virus. The Brooking Institute notes that even when controlling for age, blacks and Hispanics die at considerably higher rates than their white counterparts. For example in the 35–44 age range, the fatality rate of blacks is ten times higher than that of their whites counterparts and for Hispanics, the rate is eight times higher.

Then there’s political leaning. Unlike COVID incidence, where we found a strong, enduring relationship for the vote shares of different candidates, no such relationship emerges here with COVID fatalities. This finding is interesting, considering there’s a bit of a correlation with both Clinton vote share (r=0.194) and Trump vote share (r= -0.142). But I think one reason for this is that while political leaning can predict attitudes on the virus’s severity and actions that can lead to infection, political leaning alone cannot predict the virus’s medical toll or the probability of survival. Instead, pre-existing health and demographic factors are much stronger predictors of one’s likelihood of surviving infection. And while certain demographic factors are correlated with political leaning, all that indicates is that to the extent to which a county’s political leaning is correlated with its COVID fatality, that correlation is driven more by the demographic composition of its Clinton and Trump voters than by any unique quality attributed to political leaning. So that end, that is another key difference between COVID incidence and COVID fatality.

Then there’s educational attainment where, similar to the incidence rate, there’s a negative relationship with the fatality rate. Counties with a large share of adults with less than a high school diploma (B=0.025; 0.028) are expected to have a higher fatality rate than those with a large share of college graduates. One reason for this is that in addition to being at greater risk of infection, counties with a large share of adults without a high school diploma tend to also have a high poverty rate (r=0.654) and be outside urban areas (r= -0.197). As a result, these people may have more limited options in terms of receiving treatment or other be less healthy to begin with.

As for the state reopening plans, there’s mixed evidence on their effect on COVID fatality. On one hand, the models indicate that the use of a regional approach (B=0.005; 0.005) is positively associated with the fatality rate while the overall openness score (B= -0.005; -0.005) is negatively associated. But on the other hand, a separate regression with each of the reopening variables paint a more complicated picture. In this regression, the overall openness score (B=0.024) is positively associated with the fatality rate while the use of a regional approach is not statistically significant. For the most part, individual provisions of reopening plans are negatively associated with the fatality rate, such as looser restrictions on construction sites, retail establishments, gyms, and bars. But there are still a few provisions that are positively associated with the fatality rate, such as looser restrictions on personal care facilities and restaurants. Overall, I think this is another indication of the difference between the incidence rate and the fatality rate. While the incidence rate may be driven more by specific behaviors and by one’s environment, the fatality rate is driven more by fundamental health and demographic factors that exist independently of these behaviors and environment.

Speaking of socioeconomic factors, what’s interesting here is that while the poverty rate doesn’t appear to drive COVID fatalities, median household income does. In fact, higher median household income (B=0.017; 0.018) is associated with a higher COVID fatality rate. Even more interesting is the fact that in other models with different specifications, the poverty rate is statistically significant and positively associated with fatalities along with median household income; essentially saying that both counties with higher median household income and those with higher poverty experience a higher fatality rate. This is a strange finding, especially considering that there isn’t a strong correlation between median household income and the fatality rate (r= -0.021), although it is consistent with the ten counties with the highest fatality rate, which have higher income levels than the ten counties with the highest incidence rate.

Still, I want to scrutinize this finding further, considering that it does contradict existing literature. For the most part, this positive relationship with median household income holds in other models, giving it a good degree of robustness; however, one specification concerns a variable I have yet to discuss: the Gini Index. The Gini Index is a measure of income inequality within a population (in this case, a county) and it ranges from 0 (complete equality) to 1 (complete inequality). Unlike the average household income, the median household income is largely unaffected by outliers, such as a multi-billionaire living in a predominantly middle class neighborhood. As a result, the median household income alone doesn’t provide much information on the level of economic inequality in a population that could potentially explain these pandemic metrics.

I first decided to run a regression with all the variables, including the Gini Index. The table for that is below.

As shown, median household income (B=0.018) is still positively associated with COVID fatality, but so is the Gini Index (B=0.035), indicating that counties with high income inequality experience a higher fatality rate. There’s actually some research that supports this claim, albeit at the state level. But I wanted to go one step further.

I decided to run an additional regression, this time including an interaction term between median household income and the Gini Index (referred to as “equal” in the table below).

This offers a more complex picture on how socioeconomic status affects COVID fatality. Counties with a high median household income do not have a higher fatality rate solely due to having more affluent residents. Rather, it comes from the interaction that income has with income inequality; “affluent” counties with high income equality will have a higher fatality rate than “affluent” counties with low income inequality.

But what’s important to note is that income inequality tends to be lower in counties with a higher median household income. Welch’s t-test results indicate that the Gini Index is lower for counties with a median household income above the national average (u=0.431) than for counties with a median income below the national average (u=0.449); this difference in means is statistically significant at 95 percent confidence. One reason for this finding is that areas with a higher median income tend to be more expensive than areas with a lower median income, meaning that it’s easier for less affluent people to get priced out of more “affluent” areas while more affluent people can still move to less “affluent” areas in order to save money. As a result, counties with a high median income likely have fewer residents making substantially less than that amount, leading to lower income inequality.

What does all of this mean? It means that while different regression models may indicate that median household income is positively associated with COVID fatality, that doesn’t mean that affluent people are more likely to die from the virus. Nor does it mean that a higher median income will necessarily drive up the fatality rate. Rather, the fatality rate tends to be higher in counties with both high median income and high income inequality. And given that income inequality is highly correlated with the poverty rate (r=0.521), this suggest that for these counties, poverty may be driving fatality more than affluence.

As a side note, I went back and reran the models for COVID incidence with the Gini Index and the interaction variable and it did not alter the fundamental relationships or ability to reject the null hypothesis with any other variables discussed in the previous section, including income or poverty. More importantly, neither the Gini Index nor the interaction variable were statistically significant with COVID incidence.

So after that long tangent, let’s take a look at how these relationships change during different time periods of the pandemic. First, there’s the early months.

For the most part, the early months see similar trends to the overall relationship with the fatality rate, particularly those with old age and educational attainment. And similar to the overall trend, political leaning doesn’t have much of a relationship with COVID fatality.

The biggest divergence is the strongly positive effect that COVID incidence (B=0.52; 0.5) has on the fatality rate during the early months. This makes sense, given that in the early months, little was known about the virus itself, testing was mostly limited to those with severe symptoms (meaning a greater percentage of cases were severe and potentially deadly), and hospitals were less prepared to handle the caseload. As a result, severe cases represented a greater share of the caseload during the early months, resulting in a higher fatality rate.

One other difference is that race and ethnicity were not as strong indicators of COVID fatalities during the early months.

Overall, the incidence rate is the greatest indicator of the fatality rate during the early months. Some of this can be attributed to the limited testing capacity during the early months, resulting in the more severe, deadly cases being more heavily represented in the reported caseload; however, it’s also indicative of the overall severity of the virus outbreak during the early months.

Next, there’s the summer months.

Regarding political leaning, we see an interesting divergence. Counties with a higher Trump vote share (B= -0.01) are expected to have a lower fatality rate while the opposite is true for counties with a higher Clinton vote share (B=0.019). This differs from the overall trend, where neither political leaning is associated with change to the fatality rate.

Aside from this, variables that are statistically significant follow the overall trend. COVID incidence (B= -0.195; -0.185), for example, is negatively associated with the fatality rate. While most race variables are not statistically significant, the share of Hispanic residents (B=0.022; 0.019) is positively associated with COVID fatalities. Furthermore, the share of senior citizens (B=0.085; 0.08) is also positively associated with the fatality rate while educational attainment remains negatively associated.

Overall, the summer months are largely similar to the overall trend regarding the effect of key demographics on the fatality rate, such as ethnicity, age, and educational attainment. The main difference during the summer months is the statistically significant, divergent paths that different political leanings take regarding their effect on the fatality rate, with higher Trump support being negatively associated and higher Clinton support being positively associated.

Finally, there are the early fall months.

Similar to the summer months and the overall trend, COVID incidence (B= -0.137; -0.14) is negatively associated with the fatality rate during the early fall months.

In an interesting twist, the relationship that different political leanings have on the fatality rate reverse course from the summer months. Now, counties with a higher Trump vote share (B=0.024) are expected to have a higher fatality rate while counties with a higher Clinton vote share (B= -0.023) are expected to have a lower fatality rate. This may suggest a more causal relationship with political leaning; however, similar to the overall trend, we see many demographic variables become statistically significant during the early fall.

Race/ethnicity variables, such as black (B=0.042; 0.043) and Hispanic (B=0.028; 0.027) are both positively associated with the fatality rate. While there is still an age gap, it isn’t as wide as both counties with a large share of young adults (B=0.064; 0.064) and those with a large share of senior citizens (B=0.105; 0.107) are positively associated with the fatality rate. And the education gap persists as well, with the share of adults without a high school diploma (B=0.033; 0.039) are positively associated with the fatality rate while the share of college graduates (B= -0.024; -0.026) is negatively associated.

Overall, the early fall months of the pandemic largely reflect the overall trend with a negative relationship between the incidence rate and the fatality rate as well as key relationships with different demographics. On the other hand, there are statistically significant opposing relationships concerning different candidates’ vote shares.

Unemployment Rate

Finally, we’ll be looking at the factors that contribute to the increase in the unemployment between March and April as well as the overall recovery as of September. I’ll be using the same model specifications for this as I did with the COVID incidence rate and the fatality rate, this time with the inclusion of both these variables as independent variables. This is to determine the extent to which the economic impact is affected by the severity of the virus or whether it stems from other factors. I will also include the Gini Index and the March unemployment rate in the tables below. And given the time-specific nature of these variables, I will not be breaking the model down by different time periods. To account for this, I will be using the incidence and fatality rates in the early months when measuring the unemployment spike and I will be using the overall rates when measuring the recovery.

First, let’s look at the unemployment spike.

Given the nature of the unemployment spike, the trends observed here will largely reflect the state of the pandemic in its early months rather than its entire span. And this is most evident in the political leaning variables. While counties with a high Trump vote share (B= -0.13) are expected to have a lower unemployment increase, counties with a high Clinton vote share (B=0.157) take a more substantial hit. This makes sense, given that in the early months, virus outbreaks and restrictive public health guidelines were largely concentrated in the Northeast, West Coast, and major cities, all places that tend to lean Democratic. This finding is further supported by a t-test, which finds that the mean unemployment increase of counties that Clinton won (u=0.092) is larger than the mean of counties that Trump won (u=0.074) and that this difference is statistically significant.

As for the pandemic metrics, the early COVID incidence rate is not statistically significant. While it’s true that incidence was higher in the areas with the largest unemployment spike, it’s important to note that the early incidence rate was still fairly small as there were fewer cases relative to the population. The fatality rate, however, isn’t necessarily lower just because there are fewer total cases; in fact, when there are fewer cases, one death matters more for the metric. As for this model, the early fatality rate (B= is positively associated with the unemployment increase.

On the other hand, the March unemployment rate (B= -0.19; -0.198) is negatively associated with the pandemic-induced increase. This makes sense, given that the March unemployment rate (r= -0.041) is negatively correlated with the change in the unemployment rate for April. One reason for this is that the pre-pandemic unemployment rate is an indicator of a county’s “natural” economic health. Counties that already had a high unemployment rate tend to have underlying economic problems that exist independently of the pandemic. Rather than being a shock to the economy, the pandemic was merely an extension of issues that were already present. Another reason is simply the fact that there’s less “room” for unemployment to grow when the unemployment rate is starting out from a higher position. This partly ties into the first reason in that for places that are less well-off, there’s less prosperity and fewer jobs to be disrupted. But it can also be a result of the timing that this variable established. Some places, for example, starting seeing their unemployment rise in March or February as the first warning signs emerged, meaning that the “starting” point of this variable isn’t exactly the lowest point. For other places, April isn’t the peak of unemployment due to either a lag in the economic slowdown or to a sustained downturn. For these places, the variable I outline doesn’t capture the full extent of the economic downturn. Despite this limitation, the observation holds that the change in unemployment, the economic shock, was the greatest in places that were faring pretty well before the pandemic.

Demographics paint a pretty interesting picture, especially with race. The county share of white residents (B= -0.127; -0.114), black residents (B= -0.202; -0.213), Native American residents (B= -0.191; -0.181), and Hispanic residents (B= -0.082; -0.086) are all negatively associated with the change in unemployment. Single variable regressions of the March unemployment rate point to distinct racial and ethnic disparities; the share of white residents (B= -0.015) is negatively associated with March unemployment while the shares of black residents (B=0.017), Native American residents (B=0.024), and Hispanic residents (B=0.023) are positively associated. Meanwhile, single variable regressions of the April unemployment rate paint a more muddled picture. The share of white residents (B= -0.012) is negatively associated with April unemployment, but so is the share of Hispanic residents (B= -0.035). Conversely, neither the share of black residents nor the share of Native American residents is statistically significant at 95 percent confidence. Finally, in a single variable regression of the change in unemployment, only the share of Native American residents (B= -0.047) and the share of Hispanic residents (B= -0.059) are statistically significant; both of them are negatively associated.

But I wanted to scrutinize these findings even further. So I experimented with some other model specifications for change in unemployment. After swapping out several education variables and trimming down the model as a whole, I produced the following result.

Indeed, under some circumstances, there can be a positive association between the share of black residents and the change in unemployment rate; however, this is model only explains about 4 percent of the variation in the model (low R2 score). Furthermore, it excludes some important demographic information, such as political leaning and age. So on the whole, this finding suggests two findings. One is that because the unemployment rate for non-white racial groups before the pandemic was higher than that of whites, there’s less “room” for the change in unemployment to grow. And the other factor is that while race is often associated with disparities in economic outcomes, in this case there are multiple variables at play.

On that note, there is also an educational gap in the change of unemployment. While the share of adults with less than a high school diploma is not statistically significant, the share of college graduates (B= — 0.107; -0.114) is negatively associated with the change in unemployment. This gap makes sense, given that those with at least a bachelor’s degree are more likely to be employed in jobs that can be done remotely and in jobs within sectors that are better insulated from hits in consumer spending (education, government, finance, etc.) As a result, areas with a high share of college graduates are less likely to lose a substantial number of jobs that would result in a large unemployment hit.

The other aspect of the economic impact is the recovery, which is defined as the difference between the April unemployment rate and the September unemployment rate, the last full month of data available before the election. For most counties, this will be a negative value as the unemployment rate went down during the summer months. The results of this regression are in Figure 4.

Similar to the April unemployment increase, there are mixed results on the effects of the pandemic’s severity on the economic recovery. On one hand, the summer COVID incidence rate is not statistically significant. On the other hand, the summer COVID fatality rate is significant and has a positive coefficient. Despite this, it’s important to remember that for the recovery variable, negative values correspond to a decrease in unemployment. Because of this, the regression indicates that a higher fatality rate works against the recovery by keeping the unemployment rate up. This is definitely an interesting finding, to say the least. One reason for this discrepancy is that while the incidence rate points to the virus’s spread, the fatality rate attests more to its overall threat. A higher fatality rate communicates that the virus poses more of a threat that requires intervention, thus resulting in stronger restrictions against the economy reopening.

Regarding political leaning, the regression points to a moderate gap, where counties with a higher Trump vote (B= -0.006) share experience a larger recovery (i.e. sharper decrease in its unemployment rate) while counties with a higher Clinton vote share (B=0.006) experience a more modest recovery. This is consistent with the finding in Figure 3, where there’s a similar gap in which counties have a larger increase in the unemployment rate. Furthermore, t-test results find that while the April unemployment rate is higher in counties that Clinton won (u=0.142) than in counties that Trump won in 2016 (u=0.121), there is no statistically significant difference for the size of the recoveries.

For race, while there all groups are positively associated with the size of the recovery, the share of white residents (B= -0.104; -0.104) has a stronger effect than the share of black residents (B= -0.052; -0.052), share of Native American residents (B= -0.093; -0.093), and the share of Hispanic residents (B=0.052; 0.051). This evidence is more consistent with the racial disparities noted in existing research. In fact, the share of Hispanic residents produces a positive coefficient for a dependent variable of a negative value, suggesting that this variable contributes to a higher September unemployment rate relative to the April unemployment rate.

The educational gap is more modest, where both the share of adults without a high school diploma (B= -0.053; -0.054) and those with at least a bachelor’s degree (B= -0.026; -0.026) are positively associated with the economic recovery. One reason for this is that, as indicated in Figure 3, counties with a large share of college graduates experience a smaller increase in their unemployment rate during April due to highly educated workers having better access to jobs that can be done remotely, meaning the recovery doesn’t need to cover as much ground.

Socioeconomically, the median household income is not statistically significant with the recovery; however, both the poverty rate (B=0.022; 0.021) and the Gini Index (B=0.055; 0.054) have positive coefficients, indicating a negative relationship with the size of the recovery. This makes sense, given that counties with a higher poverty rate tend to have fewer lucrative jobs available, let alone jobs that can be done remotely. Because of this, it’s more difficult for these counties to quickly rebound from economic shocks. Similarly, the higher income inequality is in a county, the more likely it is that the recovery disproportionately benefits those in jobs that can be done remotely. This leaves a considerable share of the population that are not in these type of jobs and have more difficulty working in jobs that require face-to-face contact, stifling the overall recovery.

Finally, there’s the effect of the state reopening plans on the recovery. The regressions in Figure 4 indicate that the state’s openness score (B= -0.017; -0.017) and its level of regional discretion (B= -0.008; -0.008) are positively associated with the economy recovery; the more open the state’s plan is, the sharper the reduction in the county’s unemployment rate. This makes sense, given that the lifting of restrictions allows businesses to operate and to retain jobs, particularly those that cannot be done remotely. But similar to the other pandemic metrics, I also included the other reopening plan variables into a separate regression, where the results are below.

This presents a more interesting picture. Similar to the regression in Figure 4, the state’s overall openness score (B= -0.171) is still associated with a larger recovery, or decrease in the unemployment rate; however, this relationship isn’t found with any of the other reopening variables. The most telling, in my view, is the fact that neither the state’s definition of “essential businesses” nor its treatment of “non-essential businesses” are statistically significant. Meanwhile, certain provisions, such as the openness of construction sites (B=0.085), personal care facilities (B=0.081), and large venues (B=0.045) are negatively associated with the recovery. This appears to argue that counties within states with more open plans do not have a lower unemployment rate as a result; however, a regression of the effect of these variables on the September unemployment rate suggests otherwise.

As indicated, the state openness score and most of the reopening variables are negatively associated with the September unemployment rate, suggesting that counties subject to more open plans are expected to have a lower unemployment rate in September. Furthermore, the state openness score is negatively correlated with counties’ April unemployment rate (r= -0.263). When all these pieces are put together, it appears that while the state reopening plans only had a modest effect on the recovery, part of this is because the counties in states that would adopt more open plans had a lower unemployment rate in April resulting from the pandemic than counties in states that would later adopt less open plans, indicating there was less of a shock to recover from.

Discussion

I’ll admit this was a lot of data to pore through. A lot of tables, regression coefficients, and conflicting evidence. But let’s try to pull it all together and make some sense of all this.

First, the 2016 election results was a powerful and fairly consistent predictor of the pandemic’s severity across the United States. Across the entire time period studied, we see that the Trump vote share is associated with a higher COVID incidence rate, lower April unemployment rate, and a larger decrease in unemployment by September, while the opposite relationships hold true for the Clinton vote share.

When broken down by time period, we notice how the nature of the pandemic’s severity shifts to different regions over time. In the early months, it’s the areas with more Clinton support that experience higher COVID incidence and a larger spike in unemployment, while areas with more Trump support experience the opposite. By the summer months, the script has flipped, with areas with more Trump support start seeing spikes in COVID cases while the areas with more Clinton support get their incidence rates under control. On the other hand, areas with more Clinton support see an uptick in their COVID fatality rate during the summer while the additional cases incurred in Trump-favored counties don’t necessarily produce more fatalities. And in the fall months, the script flips again, with the counties with higher Trump support seeing an uptick in their fatality rates. On the other hand, these Trump counties, many of which are in states with looser restrictions in their economic reopening plans, enjoy a stronger decrease in unemployment.

In an age of intense party polarization, it can be difficult for partisans to understand the worldview of the other side, let alone agree with them. At this point, it appears that the pandemic, a crisis that would seemingly bring us closer together, has also been polarized. The most defining point of disagreement is the extent to which policymakers should prioritize reopening the economy or containing the virus. These results offer some context to the nature of this polarization. Namely, the differences in which the two “Americas” have experienced the pandemic reinforce their worldviews regarding the severity of the virus’s threat and the necessity for restrictive public health guidelines. For many blue counties, the tone was set early and the virus became both infectious and deadly enough to warrant tighter restrictions on in-person gatherings and economic activities. Even during safer periods, these areas have become conditioned to prioritize the containing highly infectious, life-threatening virus. Conversely, many red counties were insulated from the virus during the early months and weren’t compelled to impose shutdowns or harsh restrictions due to the lack of urgency. And by the time the virus started creeping into their communities, the issue had become polarized, fueling a resistance against stricter measures, such as mask mandates or business restrictions. While political leaders and activists were largely responsible for this divergence, the different trajectories experienced in the two “Americas” enabled their messages to be received and reinforced.

And economically, the results demonstrate that the places with the largest increase in unemployment are not necessarily the places with the highest infection rate. This is consistent with the characteristics of the top ten counties discussed in part three of this series; however, this exercise demonstrates that this is representative of the rest of the country. Particularly, while infections early on were mostly concentrated in urban, Northeastern counties, the counties that took the hardest hit economically are those that are reliant on tourism or manufacturing, which only experienced mild outbreaks.

Furthermore, while the state reopening plans don’t appear effective at producing a large decrease in unemployment, much of that is because for many counties in states with more open plans, the initial unemployment rate in April was already lower than those in states that adopted less open plans. This also reinforces the differing political narratives on the pandemic’s impact. For states with more open reopening plans, such actions were viewed as a necessary return to normalcy. While COVID incidence spiked during the summer months in Trump counties, political elites had already established the argument that these infections aren’t deadly. The fact that these areas were spared from the early wave while experiencing less of an economic shock was enough for leaders and residents to conclude that they were on the right path and there was little reason to break off course. The opposite is true for Clinton counties.

Finally, we noticed a number of demographic disparities across the different metrics. For the COVID incidence and fatality rate, there were fairly consistent racial and ethnic disparities, where counties with large black populations experienced more infections, more fatalities, and a smaller recovery in unemployment. There was also an education gap, where counties with large college graduate populations experienced less infections, less fatalities, and a more robust economic recovery. And similar gaps apply for age as well, where younger counties experienced more cases, while older counties experienced more fatalities. To a large extent, these demographic disparities reflect differences in economic opportunity with respect to the potential for infection. For more well-off places, there are more jobs that can be done remotely, which not only drives down unemployment, but also reduces in-person contact that can lead to more infection. But with age, it demonstrates the different character that the virus can inflict on different populations. For younger, more well-off populations, the virus is a contagious, but not very deadly ailment. But for older populations as well as those that struggle to access quality medical care, the virus is a more serious, life-threatening contagion. Because of all these factors, it’s important to emphasize that the virus is not a one-size-fits-all experience for the population.

Conclusion

Overall, in a country as large and diverse as the United States, the pandemic has had a severe, although varied effect on its population. These divergences not only exacerbate inequities in socioeconomic status and access to quality medical care, but also enable political actors to craft conflicting narratives on the overall threat level of the pandemic and what response that threat commands. It’s often said that large-scale crises can unify populations under an atmosphere of struggle. Some might argue that the United States needs another Great Depression or World War II in order to invigorate people with a patriotic spirit and common purpose. But what this pandemic has demonstrated is that rather than reducing polarization or at least hiding it, the pandemic has become co-opted into existing attitudes that drive this process. To that end, we have learned that possible solutions to extreme polarization cannot be solved through great crises. Some may find this conclusion disappointing; however, it also expands our understanding of the nature of polarization and opens us up to additional ideas moving forward.

So that will wrap up this series on the COVID-19 pandemic. This served as both an introduction to the post-election analysis as well a fun project to do while the states finished certifying their election results. In the next installment, we will be starting the post-election analysis series, starting with the state of Michigan.

If you enjoyed this article, please give it a like, follow the Book Aisle on WordPress as well as Medium, and share it on your social media. Have a good day.

COVID-19 Overview in the United States

Written by Adam Martin