Fighting the COVID-19 pandemic with data and context

The COVID-19 outbreak is in many ways an outlier. It emerged with unusual speed, spread rapidly throughout the globe and has elicited a public health response that is unprecedented in recent history. Yet this time, humanity faces a pandemic with a wealth of information that would have been unimaginable just a few brief years ago. Writing for ZDNet, Larry Dignan referred to this outbreak as the most visualized pandemic ever, and the sheer number of data sets available — some more accurate, timely and reliable than others — to the general public is staggering. Yet in data, as in all narrative endeavors, context is everything.Each new data source we add to the free-of-charge, public data set provides additional context. Each new source is vetted for accuracy. The data is collated, unified and normalized. Internal keying around the same widely used conventions — e.g. normalization of country names to ISO 3166–1 names and country codes — facilitates going beyond individual data sources and enables users to extract meaning and information from the raw data itself regardless of which data set it came from.

Choropleth map of U.S. counties by relative ICU demand from COVID-19, calculated as the number of cases per fully staffed ICU beds (2018).

Our recent visualization illustrating the cases as a function of staffed intensive care unit (ICU) is an example of this context. Only data unified along the same primary keys and carefully curated for data quality can create the contextual understanding that is needed in this situation. By seeing cases in the context of the underlying healthcare system, public health planners can anticipate whether healthcare capacity will be exceeded in a particular location and enterprises can track the movement of COVID-19, assess supply chain risks and evaluate strategic options.As an epidemiologist, I know first-hand that a pandemic can often feel like a sweeping turmoil, and curated, cleaned data is indispensable for data-driven decision-making. Models and decisions are only as good as the data supporting them.

The data set is freely available, and accessible via S3 as well as on Snowflake Data Exchange. We hope we can contribute by providing more information and clarity over merely providing more data. In the end, it is that information and clarity that will make all the difference.

VP of Special Projects at Starschema, clinical computational epidemiologist, rower. Passionate about the potential of data science to improve the world.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store