Visualizing Hope for a Successful, Data-Driven, COVID-19 Reopening

Published in

Nightingale

7 min readMay 8, 2020

Written by Christian Felix with contributions from Anna Foard

Over the last few months, COVID-19 numbers have taken the national and global spotlight. Throughout this time we’ve all been working toward “flattening the curve” in our communities and beyond. The events of the last couple weeks, tell us to varying extents this has been working; in late April, the White House released new guidance to help state and local officials navigate reopening their economies, and countries in Europe are also starting to re-open their societies where circumstances allow.

While metrics like the overall number of cases and the number of deaths are often easier for the average person to consume and understand, the effectiveness of the measures that have radically re-shaped our lives will ultimately come to be measured not only in raw counts but also in increments of change and growth over time.

Going back to the curve — what is it?

The concept of “flatten the curve” supposes that a certain number of people will become infected with COVID-19. The total is unknown, but by social distancing and staying home, the number of people infected at once will spread out over time. Because COVID-19 can be so deadly and require hospitalization, the goal is to keep hospitals working below capacity, ultimately avoiding additional deaths for those who will otherwise still need care.

Through the challenges of this pandemic, there has been some incredible work by the data journalism community to help the public understand the importance of flattening the curve, and whether or not we are succeeding in our efforts to flatten it.

John Burn-Murdoch’s log scale chart from the FT will likely be remembered by those of us in the data visualization community long after the pandemic has passed

Visualizing the Objective

The questions that we have wrestled with over the last few weeks are this:

How do we know that the curve has become sufficiently flat?

What data-driven thresholds are we looking for to know we have reached the point where things can gradually start to be re-opened?

One of the more compelling and helpful pieces to address this question came from the American Enterprise Institute and a team led by former FDA commissioner Dr. Scott Gottlieb in late March. Their roadmap to re-opening contains a four-phased approach with specific thresholds for action to move from phase to phase. One of those specific thresholds is when a state reports a “sustained reduction in cases for at least 14 days (i.e. one incubation period).”

Image is taken from page 3 of the National Coronavirus Response (Gottlieb, Rivers, McClellan, Silvis, and Waton )

Similar language was used by the White House in their guidance, which states that a state or a region should be experiencing a downward trajectory of documented cases within 14 days before proceeding to a phased comeback.

This, of course, is not the only target.

Local hospital capacity, availability of testing, and the capability to effectively monitor confirmed cases and trace their contacts are also vitally important factors.

But for those of us working with the publicly available case data from JHU or The New York Times, and looking to provide data visualization resources and tools as a public service, one of the key objectives at this point should be to move beyond solely reporting counts, and also clearly convey whether or not a locality is experiencing a ‘downward trajectory’ or a ‘sustained reduction’ of cases on a consistent basis.

Untangling Terminology

But what precisely is a “downward trajectory” of cases and how are we to know that our county or our region is experiencing one?

There are various ways to define it. This is how we’ve done it:

1. Comparing rates of three-day moving average case count change to the 14-day moving average rate of change allows us to better understand if case counts are experiencing a downward trajectory while also accounting for any short term variations in the data.

Consider the example of Orleans County, Louisiana, where rates of confirmed cases increased steadily through March and into early April:

The spread of COVID-19 in Orleans County, Louisiana grew significantly through March and into early April, until around the second week of April. Rates of confirmed cases have been experiencing a sustained downward trajectory ever since.

The objective is to calculate these trends at the county level and then encode and visualize them in such a way that makes it easy for the visualization consumer to understand the sustained trajectory of case counts within their county and the counties around them.

Looking at the chart once again for Orleans County, Louisiana, this time with an orange-blue, diverging color scale encoded, representing sustained trajectories applied is helpful:

Color coding is applied to represent sustained/consecutive days above or below the 14-day moving average. Numbers at line ends represent confirmed case counts associated with the corresponding moving averages.

Visualizing these categories over a 30-day period using strip plots allows the viewer to quickly see the trend status across many different counties and adds tremendous scalability to the visualization.

Total Confirmed Cases by county and strip plots of consecutive day trends over a 30 day period ending on 18.Apr for the top 12 counties by case count

Visualizing the results over time or across points in time becomes particularly compelling as it allows us to see how counties are “ascending the curve” or “descending the curve”; increasing at a sustained rate (orange) or decreasing at a sustained rate (blue):

Top 12 Counties by Case Count on April 28th and April 21st. There is significantly more blue in the strip plots on April 28th. This is a good thing.

An additional piece of important information is also encoded in the strip plots; the counties Peak Date (indicated by the black bar ‘ | ’ ). This represents the day that experienced the largest case count increase for each county. For many counties in the image above, the peak date is a week or two behind them.

Fooled by Geography

Maps have played a prominent role in visualizing the spread of COVID19, and rightfully so, as they allow us to quickly and intuitively understand the content of the data and associate it with a geographic location:

But they do have their flaws that need to be addressed. For one, in instances where data exists in Hawaii and Alaska, a US albers projection map should be used, or some other means of including those states (and possibly Puerto Rico) into the analysis. Tools like mapshaper.org or the development seed dirty reprojector app make this easy enough to accomplish and should be a consideration for the DataViz designer looking to comprehensively convey the data.

Secondly, the county level chloropleth tends to elevate the importance of the square mileage of the county at the expense of the number of cases being measured. Using the JHU map shown above as an example, our perception is quickly drawn to Arizona and California where it should instead be focused on New York and New Jersey.

Our county-level chloropleth succumbs to this flaw as well: The map below shows rate increases or decreases from March 20 to April 18:

As we progress into May, the map is turning considerably bluer. This is hopeful. However, many parts of the country (even those that have progressed towards reopening) are still light blue or even orange, indicating they may not have yet met the ‘downward trajectory’ criteria for reopening.

In the end, every chart is a bit of a compromise and even the best chart can be complemented by visualizing the data in other ways. We’ve done this by adding metrics that provide insight at the county level into sustained trends.

In addition to the strip plots, sparklines are used to convey, at scale, doubling times for each county

Doubling times in days are calculated as follows: (x*ln(2))/ln(y/z)), where:

Day 0 = the day the county first surpassed 10 cases
x = the number of days that have passed since Day 0
y = the number of cases on Day x
z = the number of cases on Day 0

Consider the following example using King County, Washington:

Higher doubling times indicate a slower spread, whereas lower doubling times are indicative of faster growth

Looking at the doubling times across counties and overtime is also encouraging. For many counties, doubling time is slowing, and the lowest doubling time days are increasingly weeks in the past.

What does it all mean?

The results are hopeful. But, once again, the case trajectories are just one piece of the puzzle. The other critical factors (availability of tests, hospital capacity, ability to track cases, and monitor contacts) are not incorporated into this analysis and are critically important. That said, for those of us who have lost loved ones to this virus, or have lost jobs, or have lost our sanity in quarantine, this should prove to be good news. It shows us that in many places across the country, things on the ground are improving and that whatever a “post-COVID” return to normal looks like the data seems to warrant a prudent and gradual move toward it.

None of this means that we are out of the woods yet. There is always the possibility of a resurgence or another outbreak. The hope is that elected officials and others who are tasked to determine whether or not to re-open will find this analysis useful as simply one piece of the incredibly complex reopening puzzle; and that at some point soon, as the data warrants, we can all begin to venture towards whatever the new post-COVID normal looks like.