Effects of Staying at Home and Population Demographics on the Spread of the Novel Coronavirus
This post is part of the “Immunity Initiative” series, describing datasets and AI tools that could provide the key to beating COVID-19. Noble.AI is making all data and AI in this series available for free for researchers and scientists working to defeat the SARS-CoV-2 virus. Learn more at www.immunity-initiative.org
Introduction
At least for now, COVID-19 has changed how we think about science. The entire tech community has rallied behind biology in an incredible way.¹ ² While we develop treatments, our primary weapon against the virus’s spread is through public health orders to reduce the spread of the virus by reducing person-to-person interactions, notably by requiring businesses to close and individuals to stay at home. To better understand the effects of stay-at-home orders, I used the Apple Mobility dataset to register how effective policy has been at encouraging people to stay at home and “flatten the curve.” Additionally, I used US Census data from 2018 to understand how age and race demographics affect the spread of the virus. This culminated in a one-of-a-kind dataset that allows for comparing the progression of the novel coronavirus with adherence to stay-at-home orders and demographics in the areas in which it spreads.
In what follows, I detail where the data came from, what they contain, and how the different datasets were joined. Then I discuss my findings:
- Every policy has a half-life
- COVID-19 disproportionately affects minorities
- COVID-19 disproportionately affects populations based on age distribution
Data
I used the following datasets to better understand the impact of this pandemic on our communities:
Johns Hopkins University (JHU) Daily COVID-19 Reports, which were combined with New York Times (NYT) Daily COVID-19 Reports to create a time series of deaths and cases across counties, states, and countries. These data are organized into three separate files for counties, states, and countries, respectively, but I primarily focused on US county data. (https://github.com/willhaslett/covid-19-growth)
The Keystone Strategy Non-Pharmaceutical Intervention Policy Dataset helps quantify the measures put into place in each county and state to combat the coronavirus through social distancing and stay-at-home orders. (https://github.com/Keystone-Strategy/covid19-intervention-data/)
Apple Mobility Dataset quantifies movement patterns over the past four months to describe how they have changed throughout the pandemic. This dataset is organized into three separate data streams: driving, walking, and public transit. (https://www.apple.com/covid19/mobility)
The 2018 Census provides population counts by race and age in every US county. (https://www2.census.gov/programs-surveys/popest/datasets/2010-2018/counties/asrh/)
The Simple Maps World Cities Database provided population data for countries and world cities. (https://simplemaps.com/data/world-cities)
These datasets were all combined by county, state, and country by FIPS code, state name, and country name, respectively, using Pandas. (Apple’s Mobility data was organized by city, so each city was mapped to all counties within 40 miles.)
To represent which policies were active at a particular time, I used a one-hot encoding schema to allow for the simple classification of active policies. Each day in a county was classified by one of these arrays, where “1” refers to that policy type being active, and a “0” refers to the policy type not being active.
Result 1: Every Policy has a Half-Life
At the beginning of the coronavirus pandemic in mid-March, Apple Driving Requests dramatically decreased due to the enforcement of stay-at-home orders and business closures. As cases began to rise, legislation was added to attempt to restrict the spread of the virus, causing further reductions in travel. Even though the county’s policy doesn’t change between 03/19/20 and 05/14/20, these policies were not followed equally as driving transportation increased by almost 25%. I am going to refer to this decrease in the effectiveness of policy with time as its “half-life.” Below, we can see how this “half-life” of policy differs between large and small counties:
Figure 8 seems to indicate that counties with larger populations (blue) have lower rates of sheltering in place than counties with smaller populations. While not statistically significant, this trend suggests that rural communities may not follow stay-at-home orders as closely as urban areas. More importantly, it seems that this difference between these two populations increased from about 5% to up to 10% as stay at home orders continued, suggesting a different half-life of policy in each county.
Result 2: COVID-19 Disproportionately Affects Minorities
Figure 9 shows that counties with smaller white populations seem to have increased incidences of COVID-19. Figure 10 reinforces this insight, showing significant diversion in progression of the disease between the two groups. The “whiteness” of a county appears to be inversely correlated to the spread of coronavirus cases: the slope of the red line in Figure 10 is nearly twice that of the blue line. This echoes recent reports that the coronavirus has disproportionately impacted minorities, particularly African-Americans and Hispanics.³
Result 3: COVID-19 Disproportionately Affects Populations Based on Age Distribution
While Figure 11 may seem sparse and lacking in information at first, it shows something very interesting. The bulk of the data is centered around age 40, and many of the outlier points, i.e., the counties that have % population diagnosed with COVID-19 as of May 10, 2020 > 0.02, have an age range of 35–45. This suggests that you need a more middle-aged population for the virus to spread widely. One possible reason for this could be that a younger population results in too many asymptomatic people who will either test negative due to the nature of current tests⁴ or not get tests because they are asymptomatic and don’t realize they are infected. On the other hand, an elderly population causes infected people to become too ill to retain mobility and therefore spread the virus to others. This results in a concentration of spreading in counties with an average age that is “not too young” and “not too old.”
Conclusion:
The amount of global disruption and devastation caused by this pandemic is difficult to grasp. The data presented here help us better understand what we are fighting and how well we are doing it. By merging datasets for the first time, I was able to see how vital staying at home has been at stopping the spread of the virus. However, in the long term, staying at home isn’t feasible, so, as society begins to “reopen”, we must ensure that people maintain social distancing procedures, wear facemasks, and take other precautions. The Apple Mobility Data will no longer serve as an effective surrogate for social distancing procedures, so I hope that as policies change, new data will become available to help us quantify and understand the effectiveness of social distancing in the same manner as I have done with stay-at-home policies.
When trying to understand how to best model and understand the progression of such an unknown enemy, it is important that we leverage data from multiple sources to better understand all the effects of a virus like coronavirus on our society. To that degree, social distancing and staying at home won’t be our only way to combat this virus for long. Currently, there are many vaccines and treatments in development. While there is cause for optimism, the best short-term way to fight coronavirus is by restricting its ability to spread by staying at home and practicing social distancing. It may also be valuable for policymakers and healthcare providers to focus on interventions in communities with large minority and middle-aged populations, given the elevated risk of infection, illness, and death for those groups.
Miles Agus is a sophomore at MIT studying Computer Science and Molecular Biology and an Intern at Noble.AI.
About Noble.AI
Noble.AI is an industry leader in AI-powered software to accelerate science and help researchers make important discoveries more quickly. Founded in 2017 and based in San Francisco and Los Angeles, the company has raised more than $12MM in venture backing.