Visualizing Hope for a Successful, Data-Driven, COVID-19 Reopening
Written by Christian Felix with contributions from Anna Foard
Over the last few months, COVID-19 numbers have taken the national and global spotlight. Throughout this time we’ve all been working toward “flattening the curve” in our communities and beyond. The events of the last couple weeks, tell us to varying extents this has been working; in late April, the White House released new guidance to help state and local officials navigate reopening their economies, and countries in Europe are also starting to re-open their societies where circumstances allow.
While metrics like the overall number of cases and the number of deaths are often easier for the average person to consume and understand, the effectiveness of the measures that have radically re-shaped our lives will ultimately come to be measured not only in raw counts but also in increments of change and growth over time.
Going back to the curve — what is it?
The concept of “flatten the curve” supposes that a certain number of people will become infected with COVID-19. The total is unknown, but by social distancing and staying home, the number of people infected at once will spread out over time. Because COVID-19 can be so deadly and require hospitalization, the goal is to keep hospitals working below capacity, ultimately avoiding additional deaths for those who will otherwise still need care.
Through the challenges of this pandemic, there has been some incredible work by the data journalism community to help the public understand the importance of flattening the curve, and whether or not we are succeeding in our efforts to flatten it.
Visualizing the Objective
The questions that we have wrestled with over the last few weeks are this:
How do we know that the curve has become sufficiently flat?
What data-driven thresholds are we looking for to know we have reached the point where things can gradually start to be re-opened?
One of the more compelling and helpful pieces to address this question came from the American Enterprise Institute and a team led by former FDA commissioner Dr. Scott Gottlieb in late March. Their roadmap to re-opening contains a four-phased approach with specific thresholds for action to move from phase to phase. One of those specific thresholds is when a state reports a “sustained reduction in cases for at least 14 days (i.e. one incubation period).”
Similar language was used by the White House in their guidance, which states that a state or a region should be experiencing a downward trajectory of documented cases within 14 days before proceeding to a phased comeback.
This, of course, is not the only target.
Local hospital capacity, availability of testing, and the capability to effectively monitor confirmed cases and trace their contacts are also vitally important factors.
But for those of us working with the publicly available case data from JHU or The New York Times, and looking to provide data visualization resources and tools as a public service, one of the key objectives at this point should be to move beyond solely reporting counts, and also clearly convey whether or not a locality is experiencing a ‘downward trajectory’ or a ‘sustained reduction’ of cases on a consistent basis.
Untangling Terminology
But what precisely is a “downward trajectory” of cases and how are we to know that our county or our region is experiencing one?
There are various ways to define it. This is how we’ve done it:
1. Comparing rates of three-day moving average case count change to the 14-day moving average rate of change allows us to better understand if case counts are experiencing a downward trajectory while also accounting for any short term variations in the data.
Consider the example of Orleans County, Louisiana, where rates of confirmed cases increased steadily through March and into early April:
The objective is to calculate these trends at the county level and then encode and visualize them in such a way that makes it easy for the visualization consumer to understand the sustained trajectory of case counts within their county and the counties around them.
Looking at the chart once again for Orleans County, Louisiana, this time with an orange-blue, diverging color scale encoded, representing sustained trajectories applied is helpful:
Visualizing these categories over a 30-day period using strip plots allows the viewer to quickly see the trend status across many different counties and adds tremendous scalability to the visualization.
Visualizing the results over time or across points in time becomes particularly compelling as it allows us to see how counties are “ascending the curve” or “descending the curve”; increasing at a sustained rate (orange) or decreasing at a sustained rate (blue):
An additional piece of important information is also encoded in the strip plots; the counties Peak Date (indicated by the black bar ‘ | ’ ). This represents the day that experienced the largest case count increase for each county. For many counties in the image above, the peak date is a week or two behind them.
Fooled by Geography
Maps have played a prominent role in visualizing the spread of COVID19, and rightfully so, as they allow us to quickly and intuitively understand the content of the data and associate it with a geographic location:
But they do have their flaws that need to be addressed. For one, in instances where data exists in Hawaii and Alaska, a US albers projection map should be used, or some other means of including those states (and possibly Puerto Rico) into the analysis. Tools like mapshaper.org or the development seed dirty reprojector app make this easy enough to accomplish and should be a consideration for the DataViz designer looking to comprehensively convey the data.
Secondly, the county level chloropleth tends to elevate the importance of the square mileage of the county at the expense of the number of cases being measured. Using the JHU map shown above as an example, our perception is quickly drawn to Arizona and California where it should instead be focused on New York and New Jersey.
Our county-level chloropleth succumbs to this flaw as well: The map below shows rate increases or decreases from March 20 to April 18:
In the end, every chart is a bit of a compromise and even the best chart can be complemented by visualizing the data in other ways. We’ve done this by adding metrics that provide insight at the county level into sustained trends.
Doubling times in days are calculated as follows: (x*ln(2))/ln(y/z)), where:
Day 0 = the day the county first surpassed 10 cases
x = the number of days that have passed since Day 0
y = the number of cases on Day x
z = the number of cases on Day 0
Consider the following example using King County, Washington:
Looking at the doubling times across counties and overtime is also encouraging. For many counties, doubling time is slowing, and the lowest doubling time days are increasingly weeks in the past.
What does it all mean?
The results are hopeful. But, once again, the case trajectories are just one piece of the puzzle. The other critical factors (availability of tests, hospital capacity, ability to track cases, and monitor contacts) are not incorporated into this analysis and are critically important. That said, for those of us who have lost loved ones to this virus, or have lost jobs, or have lost our sanity in quarantine, this should prove to be good news. It shows us that in many places across the country, things on the ground are improving and that whatever a “post-COVID” return to normal looks like the data seems to warrant a prudent and gradual move toward it.
None of this means that we are out of the woods yet. There is always the possibility of a resurgence or another outbreak. The hope is that elected officials and others who are tasked to determine whether or not to re-open will find this analysis useful as simply one piece of the incredibly complex reopening puzzle; and that at some point soon, as the data warrants, we can all begin to venture towards whatever the new post-COVID normal looks like.