A COVID-19 Lockdown Visualisation

Reproducing a recent COVID-10 chart by the BBC using Python, Pandas and Seaborn.

barrysmyth
Data Science in Practice
6 min readApr 12, 2020

--

On Friday a pal of mine tweeted the BBC graph shown below. It charts the type of lockdowns for European countries over time. I thought it was a very effective visualisation. It conveys a lot of information very efficiently and, like all good visualisations, its aesthetic helps to enhance the core message. For example, the extent to which different countries have implemented various lockdown measures, is conveyed well using the colour of the bars. The start of each stack of bars tells us when various measures were implemented. The length tells us how long measures have been in effect and changes in colour tell us how these measures have evolved. And by comparing the bars to the first-cases line graphs we can see how quickly countries responded when COVID-19 arrived.

I started thinking … how was this graph created and how could I replicate it using Python? It is clearly made up of two charts: some sort of stacked bar chart, to convey the degree of the lockdown measures, and a line graph to show the date of the first cases. So I decided to begin there …

Before starting on the graph, I needed to get the data. I could reproduce it from the graph but, as it happens, I came across the Oxford COVID-19 Tracker website recently. Its aim is to “track and compare government responses to the coronavirus outbreak worldwide rigorously and consistently.” They are doing the hard work of collecting and classifying the various different measures used by governments around the world, and turning them into a Stringency Index (SI). Briefly, the SI is a simple score, based on seven different lockdown indicators, to produce a value from 0 to 100: 0 means no restrictions, 100 means a full Wuhan-style lockdown. The SI measure is for comparative purposes only — it is not a rating of the appropriateness or effectiveness of a country’s response — but it is suitable for our needs.

The SI data is updated on a daily basis and made available via a simple API as a JSON feed. This can be converted into a Pandas dataframe in which each row corresponds to a particular date and country. Here is a example subset of this data for the last few days for Ireland. It includes a country code, date, the number of confirmed COVID-19 cases/deaths, and the corresponding SI value.

Back to the BBC graph: I mentioned that it seems to be made up of two graphs, a (horizontally) stacked bar graph and a line graph. In fact both of these intuitions are wrong — or at least they turned out not to be helpful to my implementation efforts. After a bit of experimentation it occurred to me that the best way to reproduce this chart would be as two customised (Seaborn) heatmaps, one for the SI data and one for the first-case data.

A heatmap is common type of chart which uses colour to represent how some value changes across two dimensions. Here is an example using a simple flights dataset to show the number of flights by month and year. The details are not important other than to say that darker colours correspond to more flights, as per the legend. To produce a heatmap like this in Python we start with a dataframe representing how the value of interest changes based on the rows (months) and columns (years) of the dataframe.

For our purposes we need two heatmaps so we need two dataframes, with countries as rows and columns as dates; for simplicity we use day numbers (counted from 1/1/2020) instead of actual dates. In the SI heatmap each dataframe entry is the SI value for a given country and day. Below is a subset of this dataframe for China, Italy, and the USA. For the purpose of this visualisation we are only interested in SI values greater than 20 and all lower values are replaced with null (NaN) values so tha they are not shown in the final heatmap. We can see how recent days (day numbers > 92) have higher SI values.

The following Python code uses this dataframe to produce the heatmap shown below, which, is close to what we need, at least in terms of the key elements. We will need to adjust the colours used and we can produce a set of segmented bars, as per the BBC chart, by setting the line-width for the heatmap cells. The colour bar on the right can be repositioned as a horizontal bar to act as the main chart legend. After that it should just be a matter of adjusting the axis labels and fonts.

As an aside, in the end, rather than use the raw SI values, I used a simpler categorisation of SI values, by mapping a given SI value range into one of 4 categories (1, 2, 3, 4). I did this mainly because it provided greater control over the assignment of SI values to heatmap colours/categories.

What about the first-cases graph/heatmap? This requires a similar dataframe, relating countries and days to the point at which a country’s first cases were recorded. But this time, each country only has data in one of its cells because there is a single date/day for its first confirmed cases.

Here is a sample of this dataframe, showing data for China, Italy, and the USA, Spain, and Ireland, and only those columns which correspond to days when the first cases were confirmed. Using similar code to the above we get the following heatmap, which again is close to what we need, absent a few minor adjustments; the colour bar is not necessary, the colour of the line needs to be changed etc.

Putting this all together, to produce our final graph, means displaying these two heatmaps on separate overlapping axes (with a shared x-axis). Then, we just need to make cosmetic changes to use the correct colours, font sizes, and axes formats. The code to do this is in the following gist.

The final result is shown below, which — I hope you will agree — is pretty close to the original BBC chart. Since I am using the Oxford SI data the precise details of the graph are not exactly the same as those used by the BBC, but both datasets appear to be close. And overall the look and feel of the resulting graph is very close to the original. There are a few minor details that are not quite right, mostly related to the positioning and labelling of the stringency index legend, the details of which are hard-coded currently; it should be straightforward to generalise this in due course.

In the end I think this has been a useful exercise. It was interesting to see how my initial intuitions about how to implement the graph (using bar charts and line graphs) proved to be unhelpful and equally interesting to see how straightforward the implementation turned out to be once I realised the heatmap connection: producing the required dataframes and rendering the heatmaps took only a few lines of code. If you want to find out more about the details, then the complete code behind this post and the visualisation is available here as an annotated Jupyter notebook.

--

--

barrysmyth
Data Science in Practice

Professor of Computer Science at University College Dublin. Focus on AI/ML and data science with applications in e-commerce, media, and health.