Basil Labs
Published in

Basil Labs

Who is at Most Risk?
COVID-19 Community Risk Factors & the Politicization of the Virus

by Linbo Ruan, Kyle Costello, and Nikita Jakkam. Edited by Maya Crowden
Who is at Most Risk: Population Factors — Kyle Costello
Who is at Most Risk: Transportation — Nikita Jakkam
The Political Nature of the Virus — Linbo Ruan

America will have to make some very tough decisions in the coming months — as individuals, as groups, neighborhoods, and communities, as companies, nonprofits, and educational institutions, and as city, state, and federal decision-makers.

Embedded at the heart of our research at Basil Labs are two fundamental questions:

Who is at most risk?
How do we go forward responsibly?

Our analyses focused on possible contributing factors to higher numbers of COVID-19 cases as well as the increasingly political path the virus has taken in America and the potential repercussions from this path.

These analyses are meant to evaluate normative beliefs about the spread of the virus and begin conversations on how to approach the crisis at hand and how to go forward from here.

Data Sources & Methodology

Data was obtained from the ACS via, the Community Mobility Reports from Google, The New York Times COVID-19 Github repository, Johns Hopkins’ Coronavirus Resource Center, Gallup, and Basil Labs’ own testing center locator platform.

Census: Population, transportation, and insurance data (2017)
Google: Mobility and social distancing data (as of April 2020)
NYTimes: COVID-19 number of cases and deaths (as of April 19, 2020)
Gallup: Trump approval % (2017)
Basil Labs: Coronavirus testing center locations (as of April 2020)

The following analyses employ linear regressions where the independent variable for all tests are the number of cases, the growth rate of COVID-19, and the number of deaths by US county. Given that the data was collected at different times and the rapidly changing number of COVID-19 cases and deaths, all findings must be taken with a grain of salt.

While formal, in-depth analyses with the proper rigor will be conducted in the future as more data becomes available, these analyses seek to present preliminary findings, which in turn can help craft responsible policies at the local and national levels to help mitigate this crisis. All variances in the years of data collection reflect the reality of the conditions and data availability during the time this study was developed.

Part 1: Who is at the Most Risk?

Our team sought to better understand these risk factors and examined several potential variables influencing the spread of COVID-19. While we found that some variables — such as the percentage of individuals within a county with health insurance or the average household income by county were poor predictors of the number of cases or COVID-19 related deaths, two narratives became clear through our analysis.

There are more deaths when the virus isn’t caught early in a community;
Common modes of transportation must be taken into account when shaping policies to mitigate the spread of COVID-19.

i) Population Density
There is a strong, positive and linear correlation between population density in people per square mile and the percent of patients in a state who tested positive for COVID-19 (p<0.001, r²=0.78).

ii) % Positive Tests
There is a moderately strong, positive and linear correlation between the percent of patients who tested positive for COVID-19 and the number of COVID-19 deaths in a state (p<0.001, r²=0.63).

iii) Public Transportation
There is a strong, linear and positive correlation between public transportation usage and the number of positive COVID-19 cases per county in the United States (p<0.001, r²=0.82). Public transportation usage is calculated using the percent of the population commuting to work via public transportation by county.

iv) Commuting
There is a moderate, linear and negative correlation between the percent of the population who have a 30–90 minute commute alone via personal vehicle and the number of positive COVID-19 cases per county.

What Does This Mean?

As states who were hit hard first by the virus reach their COVID-19 “peak,” legislators and other decision-makers are caught between a rock and a hard place: economic vitalization and the potential of a second wave of cases and deaths.

While there are a number of factors that may influence the number of cases in counties, the correlation between population density and the number of COVID-19 cases, when put in the context of the slow rollout of testing availability in the United States, suggests that high-density areas have quicker COVID-19 spread — thus when tests are actually made available to these communities, the percentage of total individuals who test positive are higher than other areas.

Sadly, areas with a higher percentage of total individuals who test positive also have higher death rates. Given the amount of time America took to deploy testing across the country, this finding suggests that areas where COVID-19 circulated throughout the community for a longer period of time — thus the higher percentage of positive tests — are areas where the death rate will be higher.

In other words, if the virus entered and circulated within a community earlier than other parts of America before mass testing was deployed, there will be a higher percentage of individuals who test positive when the tests become available. Population density factors into the spread of the virus, and these communities will also be the ones to lose more lives.

TL;DR To reduce deaths, we need to catch the virus early on in communities.

Part 2: The Political Nature of the Virus in America

As we learn more about the virus and factors that influence its spread, states are planning phased reopenings of businesses, and COVID-19 protests have swept the news. Georgia’s Governor Brian Kemp has (in)famously already allowed restaurants to resume dine-in service and certain other businesses like hair salons, gyms, and tattoo parlors to reopen under specific conditions to minimize the virus’ spread. His decision and President Trump’s rebuke, alongside the anti-lockdown protests in many states, illustrate the ever-increasing political nature that the virus has taken in America.

The second component of our analysis focused on this political aspect and we sought to better understand:

The relationship between Trump support and social distancing;
The relationship between social distancing and the COVID-19 growth rate.

i) Residential Mobility There is a strong, negative linear relationship between residential mobility and Trump support by state (p<0.001, r=-0.80).

Residential mobility data was obtained from the COVID-19 Community Mobility Reports from Google. The reports break each state down by county and highlight the percent change in visits to places like grocery stores, parks, transit stations, retail & recreation areas, residential buildings, and workplaces. The baseline for these data is the median value of the corresponding day of the week from January 3 to February 6, 2020.

ii) Retail Mobility There is a strong, positive linear relationship between retail and recreation areas and Trump support by state (p<0.001, r=0.71).

iii) Retail Mobility & COVID-19 Growth Rate When examining the growth rate of COVID-19, we found that retail mobility has a higher propensity than any other form of mobility to contribute to the growth rate of COVID-19. In the figure above, the p-value of retail mobility for each state is visualized with a red dotted line to indicate 0.05 and a blue dotted line to indicate a 0.1 p-value.

Twenty-four states exhibited p-values below 0.1. Comparatively, workplace mobility had 8 of 50 states, grocery stores had 9 of 50 states, transit stations mobility had 10 of 50 states, and parks had 15 of 50 states. Retail mobility alone had 15 states that fit under p-values of 0.05.

It is important to note that there can be other factors that affect growth rate — this model only considers retail mobility’s effect on the growth rate of COVID-19.

What Does This Mean?

If social distancing was practiced correctly, we would expect the mobility rate of residential areas to increase. However, the relationship between Trump support and residential mobility indicates that states with higher Trump support are not practicing social distancing at rates similar to states with less Trump support.

Our findings also suggest that states with higher Trump support also have higher rates of retail and recreation mobility. In fact, Trump support correlates to higher rates of grocery and pharmacy mobility (p<0.001, r=0.67), transit station mobility (p<0.001, r=.80), and workplace mobility (p<0.001, r=0.79). The only type of mobility in which Trump support does not reflect a similar relationship is mobility at parks (p<0.07, r=0.27).

Troublingly, out of all forms of mobility, retail mobility exhibited the more prevalent relationship to the growth rate of COVID-19. Out of the 15 states that showed the highest significance in this relationship, twelve were states where Trump won in 2016.

As the pressure on politicians to end the lockdowns increases, some of the first states to do so will likely be states with Republican leadership, states that may follow the pattern of COVID-19 growth in relation to retail mobility.

Part 3: Where Do We Go From Here? What Do Responsible, Data-Driven Policies Look Like?

There are a myriad of considerations to take in when considering to reopen a state or city — however, two particularly influential variables to consider are:

The growth rate of COVID-19
The impact on businesses, employment, and the economy

Our findings suggest caution in reopening retail. Recent initiatives by governors on opening state parks before confined indoor spaces like workplaces or retail may ease the public while maintaining proper precautionary measures.

Furthermore, findings from this study suggest that immediate detection of the virus in a community is necessary to prevent deaths. While this may seem like an unsurprising analysis, given the availability of testing across the US and the numerous barriers to testing, the reality in America is that immediate detection most likely will not come to pass — this costs lives.

This study is not meant to be rigid nor static in nature. As the crisis continues to evolve and spread to less-impacted areas, we must continue to research and develop responsible, data-driven policies to limit the number of cases and mitigate the loss of life. There is the potential for a second wave in already-hit areas, and there is a potential wave in rural areas this summer and fall. This crisis cannot be treated as a short-term event — it is ongoing, and there is no one silver bullet. Different areas show different propensities to increase the spread of the virus, and our policies and public engagement must reflect this reality.




Exploring the overlap between data science and communities. Reach out to us if you’re interested in writing — let’s create positive change through data.

Recommended from Medium

11 Best Skincare Tips for Teenage Girls , try these tips for your skin.

Recovering from a Hysterectomy

CBG Oil Tincture 1oz (1,000mg CBG) — Natural

CBG Oil Tincture 1oz (1,000mg CBG) – Natural

Large integrated multi-hospital health systems appear to have made a decisive difference in this…

The Effect of Health Agency Advertising on Patient-Doctor Communication

Another death in Covid-1 of BC, 29 new incidents recorded.

Khosla’s Healthcare Investment Thesis

I Was Involved in a Malpractice Suit That Nobody Won

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Basil Labs

Basil Labs

Exploring the applications of AI in consumer analytics.

More from Medium

What We Should Cancel Next: Opinion on an Opinion #1

In America, Covid19 Pandemic is Black history

Why Daylight Savings can't be fixed

Democrats Might be Grateful

The Statue of Lady Justice blindfolded and holding scales, always carries a sword.