Starschema Blog
Published in

Starschema Blog

Fighting the COVID-19 pandemic with data and context

The COVID-19 outbreak is in many ways an outlier. It emerged with unusual speed, spread rapidly throughout the globe and has elicited a public health response that is unprecedented in recent history. Yet this time, humanity faces a pandemic with a wealth of information that would have been unimaginable just a few brief years ago. Writing for ZDNet, Larry Dignan referred to this outbreak as the most visualized pandemic ever, and the sheer number of data sets available — some more accurate, timely and reliable than others — to the general public is staggering. Yet in data, as in all narrative endeavors, context is everything.Each new data source we add to the free-of-charge, public data set provides additional context. Each new source is vetted for accuracy. The data is collated, unified and normalized. Internal keying around the same widely used conventions — e.g. normalization of country names to ISO 3166–1 names and country codes — facilitates going beyond individual data sources and enables users to extract meaning and information from the raw data itself regardless of which data set it came from.

Choropleth map of U.S. counties by relative ICU demand from COVID-19, calculated as the number of cases per fully staffed ICU beds (2018).

Our recent visualization illustrating the cases as a function of staffed intensive care unit (ICU) is an example of this context. Only data unified along the same primary keys and carefully curated for data quality can create the contextual understanding that is needed in this situation. By seeing cases in the context of the underlying healthcare system, public health planners can anticipate whether healthcare capacity will be exceeded in a particular location and enterprises can track the movement of COVID-19, assess supply chain risks and evaluate strategic options.As an epidemiologist, I know first-hand that a pandemic can often feel like a sweeping turmoil, and curated, cleaned data is indispensable for data-driven decision-making. Models and decisions are only as good as the data supporting them.

The data set is freely available, and accessible via S3 as well as on Snowflake Data Exchange. We hope we can contribute by providing more information and clarity over merely providing more data. In the end, it is that information and clarity that will make all the difference.




Data contains intelligence that can change the world — we help people discover, manage and use this intelligence.

Recommended from Medium

How a Simple Store Cupboard Ingredient helped me get pregnant

Baby’s feet nestled in heart shaped hands

SBCCD’s 3D Printers Help Hospitals Slow COVID-19 Spread

Meet the Nocebo Effect — the Infamous Twin of the Placebo Effect…

“Unemployed” does not have to mean, “Uninsured”.

SSPE — a potential deadly after effect of measles — something to consider when making a decision…

Rickets: Stakeholders urges extensive research to determine causes, solution

Don’t Make the Same Mistake Every Year. Your Life Depends On It.

Take Your Dental Hygiene Routine to the Next Level

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Chris von Csefalvay

Chris von Csefalvay

VP of Special Projects at Starschema, clinical computational epidemiologist, rower, Golden Retriever dad, Fellow of the Royal Society for Public Health.

More from Medium

Accessing Teradata from Databricks for Rapid Experimentation in Data Science and Analytics Projects

Databricks Side Bar — Compute

Feature Engineering — 1 | The Silent Killers: Outliers!

Receiver Operating Characteristic (ROC) curve