COVID-19 and Racial Health Disparities

Big Data at Berkeley
Big Data at Berkeley
12 min readAug 27, 2020

--

By Simrin Bhargava, Bella Yavari, Mark Yang

APM Research Lab

Author’s Note: We were restricted by Medium to embed our interactive Tableau visualizations. However, we have included the links to all Tableau visualizations below each visual that was created using Tableau. Please utilize these links, as you will be able to directly interact with the metadata included in the visualizations and gain a more granular view as the reader. Thank you!

County Level Analysis

Background

America is no stranger to racial health disparities, but the inequities have come to a head in the COVID-19 pandemic. According to the CDC, racial minorities in the US experience higher rates of chronic conditions like diabetes, obesity, and cardiovascular diseases making them more vulnerable to COVID-19. Black women in the US are more likely to die from pregnancy than any other race. These disparities, however, are rooted in systemic forms of oppression such as voter suppression, fewer housing opportunities, a broken criminal justice system, underdeveloped public transportation, and numerous other barriers to living a healthy and safe life. In a COVID-19 lens, racial health disparities are prevalent due to the higher likelihood of minorities living in urban areas and working in high risk environments (e.g. minorities make up about 50% of the meatpacking plant industry which has been a COVID-19 hotspot).

When we assessed other notable studies, such as the NYTimes racial inequity article, we noticed an overrepresentation of COVID-19 cases among Black and Latino folks who are 3x as likely to contract the virus as compared to the white population (NYTimes). The story of the suburban Bradley family who entered their local hospital exhibiting COVID-19 symptoms and saw a sea of sick African-American folks speaks volumes about the racial distribution of the virus in America. Black, Latino, and Native American populations seem to be some of the most vulnerable due to deep-rooted racially-based systems of oppression.

In an effort to explore how access to healthcare during the current public health crisis differs on racial lines, we decided to use the racial breakdowns per county and number of staffed beds per county as a proxy for healthcare accessibility to discern how disparities exacerbate COVID-19 deaths.

Process

To begin the process, we started scraping data from the NY Times Github, which has information on the number of COVID-19-related cases and deaths per county and is updated daily. We also accumulated datasets on the racial breakdown of US counties (Census Data — County Racial Breakdown) and the number of staffed beds per county. This hospital dataset comes from the ESRI (Environmental Systems Research Institute) on COVID-19 research. It is used to better understand the average bed capacity and average yearly bed utilization of hospitals across the nation. Lastly, the U.S. Census Bureau provided us with a dataset of county populations, broken down by various racial groups per county. In cleaning the hospital dataset, we filtered out the hospital types which would not be able to support COVID-19 patients (e.g. Psychiatric Hospital and Religious Non-Medical Healthcare). Once we cleaned and joined the datasets, we calculated the number of staffed beds, cases, and deaths in relative terms (per 10,000 people) to get the most precise analysis. We also calculated the proportion of Black, white, Hispanic, Asian, and Native populations in each county and added the information as a new column.

Findings

The more red, the higher proportion of a certain racial group. The bigger the circle, the higher the number of staffed beds per 10,000. Source: NY Times (Cases & Deaths), US Census (County Racial Breakdown), ESRI (Staffed Beds). Tableau Links on Staffed Beds: White, Black, Native, Hispanic

We hypothesized that there would be a lower number of staffed beds in counties where the proportion of minority groups was higher, however this was not always the case. It is true that in areas where there is a lower number of staffed beds (the circles are smaller), the minority presence is larger (the circles are more red). This is especially true in Arizona and New Mexico, where we see a low number of staffed beds, but a very strong presence of Native Americans (circles are small and color is dark red). However, this is not always the case as we simultaneously see a low number of staffed beds in mainly white counties in Washington, Oregon, and Nevada. This makes it tricky to isolate racial makeup as the sole reason behind the low number of staffed beds per county. According to the visualizations, there tends to be a stronger minority presence in counties where the relative number of staffed beds is lower, but there are a few contradictions where areas with relatively high white populations have a low relative number of staffed beds.

The more red, the higher proportion of a certain racial group. The bigger the circle, the higher the number of cases per 10,000. Source: NY Times (Cases & Deaths), US Census (County Racial Breakdown). Tableau Links on Cases: White, Black, Native, Hispanic

Similar to the number of staffed beds, we hypothesized that there would be a higher number of cases in counties with higher Black, Latino, and/or Hispanic populations, however this was not always the case. While it is true that in counties where the cases per 10,000 is higher (the circles are bigger), the proportion of underrepresented minority groups is larger, there are exceptions, particularly in states, such as Minnesota and Iowa. Therefore, we cannot confidently conclude that relative cases are higher in only counties with higher proportions of minority groups. Maybe, looking into the number of deaths per county will offer more clarity…

The more red, the higher proportion of a certain racial group. The bigger the circle, the higher the number of deaths per 10,000. Source: NY Times (Cases & Deaths), US Census (County Racial Breakdown). Tableau links on Deaths: White, Black, Native, Hispanic

After concluding that there was no correlation between minority proportion and number of COVID-19 cases, we decided to explore COVID-19 deaths as a better indicator of disparities because deaths indicate a more probable need for healthcare (a small percentage of COVID-19 cases lead to hospitalization and utilization of staffed beds). Here, we discovered distinct inequities. Zooming in on Native deaths, we noticed that some of the highest relative deaths were in the very red or largely Native inhabited counties — suggesting an overrepresentation of Native populations among the COVID-19 death toll. Moreover, when we look at Black deaths, we see a similar trend that the counties with higher Black proportion are plagued with high death rates. Another notable feature of the upper right visualization is that the yellow areas with large circles tend to also be majority non-Black minority counties. It is safe to state that a correlation between COVID-19 deaths and being a minority (Black, Hispanic, Native) exists.

Our findings galvanized us to further understand the reasons behind racial health disparities and the increased vulnerability for certain minorities.

National and State Level Analysis

Background

As COVID-19 has proliferated throughout the US, there have been increasing disparities in the number of infections between the different ethnic groups. In a deeper analysis of this problem, multiple research organizations provided insights into the potential cause behind why racial minorities are more vulnerable to COVID-19 then the majority of white Americans. From Arm Research Lab, their key finding was that Black Americans experience the highest actual COVID-19 mortality rate throughout the US. Additionally, minority groups — Latino, Indigenous, or Pacific Islanders — are right behind the mortality rate of Black Americans. The Arm Research Lab suggested various factors behind the cause of the great impact of the COVID-19 virus on racial minorities. While it may vary between the communities, the cause behind the impact on minority groups was due to higher likelihood of virus contraction, such as “greater workplace exposures, including the inability to work from home or no access to sick days; living in geographic areas, housing arrangements including congregate settings (such as nursing homes, group homes, treatment centers, correctional facilities), or accessing public transportation where the virus is more easily spread” (Arm Research Lab).

In addition to the discovery from Arm Research Lab, writer of BBC news Christine Ro has added some of her findings in the analysis behind the devastating impact on minority groups from COVID-19. In Ro’s article, “Coronavirus: Why some racial groups are more vulnerable”, she writes that in April 2020 Chicago, 72% of people who died from COVID-19 were black, despite being only one-third of the total population. On April 17 in Georgia, white people accounted for 40% of COVID-19, despite making up 58% of the population in the state. She accounted for several factors behind this inequity on numerous factors. The first was income inequality. Ro noted that people of the non-white ethnic group have lesser access to an economic resource — “wheter that means high-earning job or a full pantry” (Ro). The instability in the economic conditions is directly related to food insecurity, which is linked to poor health outcomes. Even before the pandemic, 91.1% of South African households were considered vulnerable to hunger, in contrast to 1.3% of white person households who are vulnerable to hunger. Another factor was the Occupational factor. Ro described that low-income households, which mostly are comprised of racial minorities, are more close to having a profession that involves hazardous work in a small work environment. The limited space provides greater interaction between the workers, which may lead to a higher chance of COVID-19 infection amongst the workers (Ro). Likewise, the inequity between the minorities and the major white population is being reissued by the statistical findings of some researchers.

Total Death and Case Counts

In order to confirm the findings of the Arm Research Lab and Christine Ro we tried to break down the racial demographic into numbers and analyze the difference between the minority and majority groups in the number of people that were affected by COVID-19. The data set that I used for the analysis was a data set that was provided by The COVID-19 Tracking Project, as their data contained information about the number of cases and deaths for each racial group. To start the analysis we first took a general overview of the total sum of cases and deaths for each racial group that was given in the data.

The image above is the bar graph of the total number of deaths counted for each ethnic group within the US. Tableau links: Deaths
The image above is the bar graph of the total number of cases reported for each ethnic group in the US. Tableau links: Cases

Surprisingly, most of the cases that were reported lacked information about the ethnicity of the patients. While the highest deaths were from non-Hispanic whites, most of the reported cases did not carry any information about the racial background. The lack of information may derive from the rapid spread of COVID-19 and the incubation period, which made the patients contract COVID-19 unknowingly. Therefore, the number of ‘cases unknown’ can be seen as the sum of patients under the incubation period. Following the number of cases unknown, non-Hispanic whites followed second and the unknown ethnic group came as third of the most counted cases. In contrast to the observation made by Ro and the Arm Research Lab, there were more white patients reported with COVID-19 cases than the number of AIAN, Asian, Black, Hispanic, and NHPI. In the case of total deaths throughout the racial groups, there were less number of ‘unknown deaths’ compared to the number of unknown cases reported. This denotes that the COVID-19 Tracking Project were able to attain the racial background of the patients when they died, unlike with cases recorded when the virus first spread.

With given data, further analysis was done in depth to see how the number of cases and deaths differed amongst different racial groups within each state.

The image above is the visualization of the map of the US and the shade of color represents the size of deaths from COVID-19 infection within each state. Tableau link: Count of Deaths per State
The image above is the visual representation of the map of the US. The shades of color above each state represent the size of the reported COVID-19 cases within the state. Tableau link: Count of Cases per State

The visualization above provides the total count of cases and deaths for each state represented by the shade of the color. In a bigger perspective, these visualizations give a general sense of some of the states that were heavily affected by COVID-19, such as California, Texas, New York, and Louisiana. In order to analyze further detail in differentiating the impact of COVID-19 on minority groups within each state, stacked histograms were to observe differences between the racial groups from each state.

The image above is the stacked histogram of racial demographic number of deaths from COVID-19 per each state in the US. Tableau link: Deaths Stacked Histogram
The image above is the stacked histogram of racial demographic number of reported cases of COVID-19 per each state in the US. Tableau link: Cases Stacked Histogram

Above is the histogram that represents the total number of cases and deaths within each state over the period of data record and it also provides infected racial demographics within the total patients of each state. As one can see, California was recorded with the highest number of cases, while New York reached the highest number of deaths amongst the COVID-19 patients.

Deeper Analysis — California

From the displayed image above, one should note that California’s number of COVID-19 related deaths is relatively low compared to the number of infected patients. This provides some insights into the situation of California. We inferred that the high number of COVID-19 cases in California derives from the high number of population concentration within the state. In deeper analysis behind the impact on California, a separate histogram was utilized.

The image above is the histogram of the racial demographic that was reported with COVID-19 case in California.

The histogram above shows that the LatinX population within California suffered the most COVID-19 cases and ethnicity unknown cases came in second. As Ro mentioned in her article, the number potentially derives from the lack of economic access that LatinX group has as racial minority in the US, which directly relates to vulnerable health conditions to the virus. Some other factors, such as poor working conditions and unhealthy diet may also be the cause of high vulnerability to COVID-19. The case of Unknown may be attributable to the sizable population of homeless people in California, but it also demonstrates how tough it can be to accurately centralizing COVID-19 testing data across the state.

The image above is the histogram of the racial demographic of the number of deaths from COVID-19 cases in California.

The histogram above shows the number of deaths within California. From the visualization, there are a lower number of deaths reported for each ethnic group compared to the number of infected patients. We inferred that the government was able to identify the race of the infected as there were more dead patients with identified ethnicity than the number of unidentified dead patients. The comparison between the histogram of cases and deaths in California demonstrates that despite the rapid spread of the virus due to high population, the government was able to take appropriate measures in treating the infected patients.

Deeper Analysis — New York

If we take a look at New York, we can see that the state recorded both a high number of deaths and infected cases.

The image above is the histogram of the racial demographic that was reported with COVID-19 cases in New York.

You might be asking yourself, why are we showing this bar chart? What these two bars tell us is that the ethnic identity of every individual in New York that tested positive for COVID-19 was unknown. This visualization further magnifies the lack of emphasis on obtaining ethnicity when testing, and it again highlights the difficulty of centralizing medical testing data. When the pandemic first spread throughout the US, New York was the first state to conduct widespread testing, so one possible explanation for what we see could be that the state had to sacrifice obtaining ethical data in order to conduct tests as quickly as possible.

The image above is the histogram of the racial demographic of the number of deaths from COVID-19 cases in New York.

Although the number of COVID-19 related deaths was smaller in number compared to the number of infected, it was still the largest count of deaths amongst the other states in the US. The visualization of cases in New York demonstrates that the state was unable to identify the ethnic background of patients. This is due to high concentration of population in contrast to small land areas which led to rapid spread of the virus in a short amount of time. The data may also derive from the high concentration of homeless people in New York as they are heavily located in small-space areas, such as subways and building lobbies. Within the compacted space, the population of homeless people could have driven the rapid spread of COVID-19 as most of the citizens in New York use public transportation and work in densely packed- buildings.

Conclusion

COVID-19 has undoubtedly affected the entire nation. However, the negative impacts are not distributed evenly. It has shown to be more challenging to acquire accurate data on the cases based on various racial groups; hence, a lack of analysis in tracking the number of cases per ethnic group and comparing the discrepancies between minorities. However, the number of deaths, for more obvious reasons, have been accounted for and it is in this data that the racial disparities are apparent. Specific racial minority groups, such as Native Americans and Black people, are much more susceptible to the virus, which could be related to societal systems of oppression and a lack of economic mobility. The COVID-19 pandemic has unearthed the deep-rooted inequalities within our nation and it is up to us to begin discourse and take action towards solutions.

Follow us on Instagram @bigdata.berkeley and visit our website at bd.berkeley.edu !!!

--

--