Effects of Staying at Home and Population Demographics on the Spread of the Novel Coronavirus

Miles Agus
Noble.AI
Published in
7 min readMay 22, 2020

This post is part of the “Immunity Initiative” series, describing datasets and AI tools that could provide the key to beating COVID-19. Noble.AI is making all data and AI in this series available for free for researchers and scientists working to defeat the SARS-CoV-2 virus. Learn more at www.immunity-initiative.org

Introduction

At least for now, COVID-19 has changed how we think about science. The entire tech community has rallied behind biology in an incredible way.¹ ² While we develop treatments, our primary weapon against the virus’s spread is through public health orders to reduce the spread of the virus by reducing person-to-person interactions, notably by requiring businesses to close and individuals to stay at home. To better understand the effects of stay-at-home orders, I used the Apple Mobility dataset to register how effective policy has been at encouraging people to stay at home and “flatten the curve.” Additionally, I used US Census data from 2018 to understand how age and race demographics affect the spread of the virus. This culminated in a one-of-a-kind dataset that allows for comparing the progression of the novel coronavirus with adherence to stay-at-home orders and demographics in the areas in which it spreads.

In what follows, I detail where the data came from, what they contain, and how the different datasets were joined. Then I discuss my findings:

  1. Every policy has a half-life
  2. COVID-19 disproportionately affects minorities
  3. COVID-19 disproportionately affects populations based on age distribution

Data

I used the following datasets to better understand the impact of this pandemic on our communities:

Johns Hopkins University (JHU) Daily COVID-19 Reports, which were combined with New York Times (NYT) Daily COVID-19 Reports to create a time series of deaths and cases across counties, states, and countries. These data are organized into three separate files for counties, states, and countries, respectively, but I primarily focused on US county data. (https://github.com/willhaslett/covid-19-growth)

Figure 1. One of the files generated by the compilation of NYT and JHU data. Each row refers to the data from a particular county, state, or country on a given day.

The Keystone Strategy Non-Pharmaceutical Intervention Policy Dataset helps quantify the measures put into place in each county and state to combat the coronavirus through social distancing and stay-at-home orders. (https://github.com/Keystone-Strategy/covid19-intervention-data/)

Figure 2. The start and end dates for all coronavirus-related policy (county and state level) up to 4/15/20. It is organized so that there are twelve policies (including “Other”) for each county or state with the start date (if applicable) and end date (if applicable) of each policy. For example, the first row indicates that starting 3/28/20, Alabama placed a limit on gatherings of ten or more people.

Apple Mobility Dataset quantifies movement patterns over the past four months to describe how they have changed throughout the pandemic. This dataset is organized into three separate data streams: driving, walking, and public transit. (https://www.apple.com/covid19/mobility)

Figure 3. The percentage of routing requests from Apple Maps for transportation (public transit, driving, or walking) in a particular region and compared to 1/13/20 as a baseline. This percentage was used as a surrogate for how much people were staying at home to reduce the spread of the coronavirus.

The 2018 Census provides population counts by race and age in every US county. (https://www2.census.gov/programs-surveys/popest/datasets/2010-2018/counties/asrh/)

Figure 4. The population, broken down by race, of every five year age cohort from 0–85, organized by each county’s FIPS code was used to provide state- and county-level demographics.

The Simple Maps World Cities Database provided population data for countries and world cities. (https://simplemaps.com/data/world-cities)

Figure 5. Each row of this dataset details the location and population of a major world city. It was used primarily to join US cities to counties.

These datasets were all combined by county, state, and country by FIPS code, state name, and country name, respectively, using Pandas. (Apple’s Mobility data was organized by city, so each city was mapped to all counties within 40 miles.)

Figure 6. One-Hot Encoding Schema . The array above represents a county with the following policies active: Closing of Public Venues, No Gatherings of Any Size, School Closure, and Mandated Social Distancing.

To represent which policies were active at a particular time, I used a one-hot encoding schema to allow for the simple classification of active policies. Each day in a county was classified by one of these arrays, where “1” refers to that policy type being active, and a “0” refers to the policy type not being active.

Result 1: Every Policy has a Half-Life

Figure 7. The progression of COVID-19 in Los Angeles County, with the number of new cases reported each day in red, and Apple Mobility Data, used as a surrogate for staying at home, in blue. The legend refers to which policies are active following the one-hot encoding scheme explained above. Each coloring represents the implementation of a new piece of coronavirus-related policy.

At the beginning of the coronavirus pandemic in mid-March, Apple Driving Requests dramatically decreased due to the enforcement of stay-at-home orders and business closures. As cases began to rise, legislation was added to attempt to restrict the spread of the virus, causing further reductions in travel. Even though the county’s policy doesn’t change between 03/19/20 and 05/14/20, these policies were not followed equally as driving transportation increased by almost 25%. I am going to refer to this decrease in the effectiveness of policy with time as its “half-life.” Below, we can see how this “half-life” of policy differs between large and small counties:

Figure 8. Counties segmented by size, comparing how larger counties (blue) and smaller counties (red) have acted in response to coronavirus.

Figure 8 seems to indicate that counties with larger populations (blue) have lower rates of sheltering in place than counties with smaller populations. While not statistically significant, this trend suggests that rural communities may not follow stay-at-home orders as closely as urban areas. More importantly, it seems that this difference between these two populations increased from about 5% to up to 10% as stay at home orders continued, suggesting a different half-life of policy in each county.

Result 2: COVID-19 Disproportionately Affects Minorities

Figure 9. Percentage of the population diagnosed with coronavirus each day in a particular county (logarithmic scale), with colors representing the % of a given county’s population that is white.
Figure 10. Counties split into two groups by racial composition: those with majority-white populations and those with minority-white populations. The lines represent the means, for each group, of the % of new cases reported each day. Standard deviations are shaded around the means.

Figure 9 shows that counties with smaller white populations seem to have increased incidences of COVID-19. Figure 10 reinforces this insight, showing significant diversion in progression of the disease between the two groups. The “whiteness” of a county appears to be inversely correlated to the spread of coronavirus cases: the slope of the red line in Figure 10 is nearly twice that of the blue line. This echoes recent reports that the coronavirus has disproportionately impacted minorities, particularly African-Americans and Hispanics.³

Result 3: COVID-19 Disproportionately Affects Populations Based on Age Distribution

Fig 11. In the graph, we have a plot that shows the total number of coronavirus cases as of 5/10/20 vs the average age of each county.

While Figure 11 may seem sparse and lacking in information at first, it shows something very interesting. The bulk of the data is centered around age 40, and many of the outlier points, i.e., the counties that have % population diagnosed with COVID-19 as of May 10, 2020 > 0.02, have an age range of 35–45. This suggests that you need a more middle-aged population for the virus to spread widely. One possible reason for this could be that a younger population results in too many asymptomatic people who will either test negative due to the nature of current tests⁴ or not get tests because they are asymptomatic and don’t realize they are infected. On the other hand, an elderly population causes infected people to become too ill to retain mobility and therefore spread the virus to others. This results in a concentration of spreading in counties with an average age that is “not too young” and “not too old.”

Conclusion:

The amount of global disruption and devastation caused by this pandemic is difficult to grasp. The data presented here help us better understand what we are fighting and how well we are doing it. By merging datasets for the first time, I was able to see how vital staying at home has been at stopping the spread of the virus. However, in the long term, staying at home isn’t feasible, so, as society begins to “reopen”, we must ensure that people maintain social distancing procedures, wear facemasks, and take other precautions. The Apple Mobility Data will no longer serve as an effective surrogate for social distancing procedures, so I hope that as policies change, new data will become available to help us quantify and understand the effectiveness of social distancing in the same manner as I have done with stay-at-home policies.

When trying to understand how to best model and understand the progression of such an unknown enemy, it is important that we leverage data from multiple sources to better understand all the effects of a virus like coronavirus on our society. To that degree, social distancing and staying at home won’t be our only way to combat this virus for long. Currently, there are many vaccines and treatments in development. While there is cause for optimism, the best short-term way to fight coronavirus is by restricting its ability to spread by staying at home and practicing social distancing. It may also be valuable for policymakers and healthcare providers to focus on interventions in communities with large minority and middle-aged populations, given the elevated risk of infection, illness, and death for those groups.

Miles Agus is a sophomore at MIT studying Computer Science and Molecular Biology and an Intern at Noble.AI.

About Noble.AI

Noble.AI is an industry leader in AI-powered software to accelerate science and help researchers make important discoveries more quickly. Founded in 2017 and based in San Francisco and Los Angeles, the company has raised more than $12MM in venture backing.

--

--