Starter Guide to COVID-19 Data

Patrice Metcalf-Putnam
The Startup
Published in
4 min readApr 10, 2020

This guide covers the data landscape, including all the major publicly available coronavirus data sets, free tools and dashboards, and advice for publishing results in an ethical way.

Viz responsibly

With great power, comes great responsibility. Data, taken out of context, can cause misinformation that could lead to damaging outcomes, like not following guidance from health officials. Before making any visualization or abstract model, I highly recommend you read through this article from Tableau: 10 considerations before you create another chart about COVID-19. If you take nothing else from the article, remember to:

  • Leave the rate calculations to epidemiologists
  • Make sure to fully understand the context behind the data
  • Don’t publish anything potentially misleading
  • Remember every data point is a human being

Quick start guide to COVID-19 data

This section covers how to quickly start exploring the most-used data source with minimal tech skills needed.

The tools and resources in this section are based on the Johns Hopkins Center for Systems Science and Engineering (JHU CSSE) COVID-19 data repository, made publicly available for educational and academic purposes only (no commercial use). This data is global, tracking cases, deaths, and recoveries over time, by country (and sometimes region/state when applicable). The data comes from a variety of sources, including the World Health Organization (WHO) and government entities.

Using and adapting existing visualizations to fit a story

Datawrapper is a visualization tool meant for people without design or coding skills. The company recently authored a blog with a number of visualizations using the JHU CSSE data.

All 17 of these visualizations can be used directly or modified, to match different style guides or change the visualization. Having never used the tool before, I was able to embed this chart in less than five minutes after signing up for a free account, and the data get updated multiple times a day:

If embedding into a Medium article, be sure to embed using the custom link url, not the full embed code.

Tableau starter workbook & gallery

Tableau Public is a free, cloud-based data visualization tool. Without even signing up for an account, you can interact with visualizations in their COVID-19 visualization gallery.

To modify or make your own visualizations using Tableau Public, you will need to download the application, and anything you save will be publicly available on your profile. Downloading their starter workbook is the easiest way to get started; the data (from JHU CSSE) have already been cleaned and optimized for use in Tableau, and can be refreshed easily.

Embedding charts into articles is technically possible with Tableau Public, but somewhat tricky in practice, depending on the platform (Medium does not support Tableau embeds).

Major data sources and their contents

Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)

Global data compiled from various organizations and government entities, tracking cases, deaths, and recoveries over time. For educational and academic purposes only.

data visualization center | dashboard | mobile dashboard | data | normalized data| API

The COVID Tracking Project

US data by state, tracking COVID-19 tests over time, including positive, negative, and pending results. These data are collected almost entirely from local government health authorities. The number of cases in the US is artificially low due to low testing, and data from the COVID Tracking Project helps put our testing rate in perspective.

data | API | FAQ | NYT viz

Cuebiq mobility data

This data intelligence firm is providing US mobility data, essentially showing changes in travel patterns over time. These data are being used to get a sense of how much the nation is changing habits and social distancing.

data | Tableau dashboard | about | NYT viz

NYT US County-level data

Most data on COVID-19 are at the state level at best. The New York Times (NYT) started tracking and sharing county-level data late last month, in addition to state-level data (both US-only).

article | data | latest map

Google Cloud BigQuery public datasets

A number of datasets, including the JHU CSSE, have been made available through Google’s BigQuery Public Dataset Program. Queries on these COVID-19 datasets are free until September 15th, 2020. Caution: if you join these datasets with others that are not being provided free as part of this program, you will be charged.

More datasets may be included over time. Currently, there are a number of datasets that include more general demographic data, such as Census data, in addition to coronavirus-specific datasets. Be sure to check licenses for each dataset you use before publishing.

about | datasets | intro to BigQuery | BigQuery API

1Point3Acres

One of the data sources for the JHU CSSE global dataset, this site has its own dashboard that includes global cases, as well as personal protective equipment (PPE) demands and testing across the US (testing data from the COVID Tracking Project). To use their data directly, you must request access and get approved.

dashboard | request access

Kaggle machine learning challenges

A number of challenges are focused on gaining a better understanding of COVID-19. Along with these challenges are a number of datasets, like the text to thousands of scholarly articles about COVID-19 for use with natural language processing (NLP).

COVID-19 machine learning challenges

The Institute for Health Metrics and Evaluation (IHME)

This independent population health research center at the University of Washington provides historic country and state-level health resource use figures, in addition to projections and estimates going forward. Their dashboard is a real-time tracker of what it means to “flatten the curve,” and you can download their most recent estimates.

dashboard | resources | most recent estimates

State or county level stats and dashboards

Many states and counties have their own dashboards. These often provide data at the zip code level.

Santa Clara County (CA) | King County (WA) | NYC

And more?

If you see something missing from this guide, please leave a note!

--

--