Disease Surveillance of COVID-19

Alexander Hohl
Atlas Insights
Published in
5 min readSep 23, 2020

My journey to analyzing the geography of COVID-19 started with a phone call in March 2020, in which Michael Desjardins asked me whether I want to be part of a research project on the topic. “Of course!” I answered, as I had been thinking about how to get a better understanding of the pandemic since January, when the first cases were reported in the United States. However, I struggled to envision a project and formulate research questions, as my academic focus at the time was centered on computational issues, rather than on Epidemiology. I met Michael during our time as PhD students at UNC Charlotte, so starting the conversation towards collaborating on research was easy. He had a clear vision and knew what to do to get us started. We got our former PhD advisor Eric Delmelle on board and began to brainstorm project ideas.

We came up with the idea to apply disease surveillance techniques to COVID-19, with the hope of contributing to the conversation and to engage fellow geographers to direct their energies towards this issue. At the time, dashboards showing the spatial (and temporal) distribution of case counts using basic cartographic techniques were already increasing in numbers and variety. However, we identified the need for enhancing maps of relative risk with delineations of statistically significant areas of elevated risk, aka clusters. After applying the prospective space-time scan statistic to the COVID-19 case counts provided by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, we obtained detailed snapshots showing the evolution of COVID-19 outbreaks in the U.S., which differed in geographic extent and relative risk (Figures 1–3).

Figure 1: Spatial distribution of emerging space-time clusters of COVID-19 at the county-level from March 9th, 2020. Source: Desjardins et al. 2020.
Figure 2: Spatial distribution of emerging space-time clusters of COVID-19 at the county level from March 27th, 2020. Source: Desjardins et al. 2020.
Figure 3: Spatial distribution of emerging space-time clusters of COVID-19 at the county level from April 27th, 2020. Source: Hohl et al. 2020a.

When comparing the results of these three temporal snapshots, it is striking that 1) the number of detected clusters went from 8 (on March 9th, Figure 1) to 26 (March 27th, Figure 2) to 16 (April 27th, Figure 3), and 2) the average relative risk inside the clusters went from 93.8 to 5.7 to 24.5. On March 9th, we saw coastal urban areas affected by the virus, especially in the Pacific Northwest and New York, which shifted to isolated outbreaks in rural/suburban counties, such as Summit (UT) or Kershaw (SC) Counties by March 27th. On April 27th, we observed spillover effects as growing clusters, i.e. in New Orleans (LA), Atlanta (GA) regions. In summary, we witnessed a change in the spatial configuration of clusters that ultimately affected the entire nation.

After concluding our first push towards analyzing the geography of COVID-19, we were dissatisfied with having only snapshots of time, while the pandemic was playing out every day in front of our eyes. As COVID-19 case counts are updated daily, we decided to carry out the clustering daily as well, which allowed us to track the evolution of the pandemic at a fine spatial and temporal resolution, and to ultimately depict its morphology within the United States. Figure 4 shows time series graphs of various metrics describing the clusters, their geographic extents and associated disease risks. In general, we identified an initial period of chaos and growth, especially in terms of the population, number of counties, and relative risk. This period lasted until mid-March, and was followed by a phase of stability until the end of the study period.

The results of our daily clusterings can be viewed on covid19scan.net, which features an interactive map that depicts the clusters through time. We leverage web-mapping capabilities of shinyapps.io and Leaflet, and the clusters are drawn with simple circles that reveal their characteristics in pop-up boxes when hovering over. It features a time-slider that allows the user to explore the morphology of COVID-19 since the first significant cluster materialized on March 1st.

Figure 4: Cluster characteristics over time. Solid black lines — summary statistic (sum or mean), dashed blue lines — standard deviation. nClus: The number of clusters resulting from the space-time scan statistic; nCty: Number of counties that are part of a cluster; avgDur: Average duration of clusters; avgRad: Average cluster radius; cluPop: Total population within the clusters of a given day; cluObs: Number of observed cases within clusters; cluExp: Number of expected cases within clusters; LLR: Log likelihood ratio; RRclu: Cluster Relative risk. Source: Hohl et al. 2020b.

Our work has caught the attention of critics, who argued that our analyses are based on case counts that are confounded by testing effort, which differs from state to state. While this is concerning and true, we still see our work as a legit contribution to disease surveillance as our analyses were among the first published papers on COVID-19 within the field of Geography. Two recent posts in the COVID Atlas Blog (post 1, post 2) shine light on different practices of reporting testing efforts across the nation. The availability of testing data (or lack thereof) poses a challenge. These numbers are not available at the same spatial and temporal granularity as case counts and deaths, leaving scientists who want to compare the magnitude of outbreaks across regions within the United States with a piecework of scraping together testing data from different sources. Oftentimes, the desired testing numbers are not available for the desired 1) time frame, 2) geographic area, 3) level of aggregation, or 4) data reporting criterion. Therefore, we may focus our future efforts on “filling in” gaps in testing data at the county level with estimates by scaling down the numbers from the state-level. Machine Learning methods, such as Random Forests, as well as Regularized Generalized Linear Models have been proven to be successful at such tasks. Predicting testing numbers could be a great contribution geographers make towards understanding the COVID-19 pandemic, especially in the absence of a unified data reporting strategy.

References:

Desjardins, M. R., Hohl, A., & Delmelle, E. M. (2020). Rapid surveillance of COVID-19 in the United States using a prospective space-time scan statistic: Detecting and evaluating emerging clusters. Applied Geography, 102202.

Hohl, A., Delmelle, E., & Desjardins, M. (2020a). Rapid detection of COVID-19 clusters in the United States using a prospective space-time scan statistic: an update. SIGSPATIAL Special, 12(1), 27–33.

Hohl, A., Delmelle, E. M., Desjardins, M. R., & Lan, Y. (2020b). Daily surveillance of COVID-19 using the prospective space-time scan statistic in the United States. Spatial and Spatio-temporal Epidemiology, 34, 100354.

Alexander Hohl is an assistant professor at the University of Utah

--

--