Rising or Falling? Leveraging Bivariate Glyphs to Visualize Trend Changes

An illustration using COVID-19 data

Lucy McLaughlin
Nightingale
8 min readApr 29, 2021

--

Over the past year, we’ve seen a number of innovative and effective methods for visualizing COVID-19 data. In fact, through the daily press briefings from the UK government and the work of news organisations worldwide, COVID-19 has in many ways thrust the entire field of data visualization into the spotlight. However, if you’ve been following along, you know that the number of cases alone doesn’t tell the whole story. If you want to be able to convey more information using a single visualization, you can add another variable: acceleration, which shows whether daily case numbers are stable, or are rising or falling, and at what rate. But adding this information can be a challenge, especially if you’re attempting to compare multiple geographic regions simultaneously.

When doing such a comparison, out of a dataset where some places have a high case load, some moderate and some low, the area with the high case load immediately jumps out as the area of concern. But what if you were also able to see that the area with the high case load was seeing a steady decrease in cases, while one of the areas with a moderate case load was in fact seeing significant growth? This is the primary imperative of the charts below, aiming to display this extra layer of information in order to significantly enhance the viewer’s understanding of the situation. Compare the information gain of the chart on the right versus that on the left.

In the London area, the case load is the highest in the country, but has decreased over the last week. Meanwhile the North West area, with a relatively moderate case load, is still seeing an increase in cases.

This additional data has been shown with the help of what are known as visual entropy glyphs [1]. In previous research [2][3], glyphs like those shown in the left-hand image above were used to represent urban environmental data, such as temperature. Visual entropy glyphs arose out of a need to indicate which sensors were most reliable, by representing the uncertainty associated with each sensor’s measurements.

The central colour represents a first data value, while the enclosing shape represents a second value augmenting the first in some way. The enclosing shapes in the outer ring have measurably varying levels of complexity, which encodes said “visual entropy.” This concept of entropy has been tested in trials in different regions of the world, which has led to a standardized perceptual order. The testing involved a two alternative forced choice task, in which participants were shown pairs of glyphs and asked to select their perception of greater value from each pair as represented by shape. These bi-directional rank-ordered glyphs then can be used to add a second variable in a wide variety of situations including, in our example, derivatives of a primary data value like case rate acceleration.

To make it simple to produce a wide range of visualizations using visual entropy glyphs, Nick Holliman, professor of visualization at Newcastle University, and I have developed vizent, a Python library integrated with matplotlib. Through the rest of this article I will use vizent to demonstrate how visual entropy glyphs can be created and integrated into your own data visualization, including code. If you would like to create your own graphs using vizent, you can get it on PyPI and GitHub.

Using vizent to visualize COVID-19 data

1) A basic example of using vizent

Here is an example of a very simple scatter plot created using vizent. As shown, you can specify a colour value, a shape value (the glyph layer in the outer ring) and a size value for each point. A title, axis labels, and labels for the shape and colour variables have also been specified here, and a colormap has been selected from matplotlib’s library of options.

A basic example of a visualization created using vizent

2) Plotting on a map using cartopy

Now let’s plot some real-world data. We will show how to visualize COVID-19 trends across the different regions of England, using data from Public Health England [4]. Using vizent, you can plot any geographical data using maps via the cartopy package. Let’s begin by plotting data from the 4th of January in our COVID-19 dataset as an example starting point. The extent variable in vizent, as shown below, is used to specify the axis limits (with the x-axis margins followed by y-axis margins) of the data you wish to display. We will utilize cartopy from directly within the vizent function call.

From the below dataset, we will utilize ‘lon’ and ‘lat’, the longitude and latitude coordinates of each location, ‘Av7DayPop’, the 7-day average number of cases per 100,000 population, and ‘Av7DayPopWeeklyDiff’, the weekly change in the 7-day average of cases per 100,000 population.

The dataset for the 4th of January
Cartopy provides maps for any location, but in some cases the detail may be insufficient.

3) Using an image background

This map doesn’t show much detail, and since different regions of England are being compared, it might be useful to see, for example, the corresponding cities and towns. We have included a map of England and Wales from OpenStreetMap in vizent, so this will be used as the background. If you wish to visualize any other area, you can include your own image background by setting use_image=True, then setting image_file as the file path to your image. image_type should not be specified. If using your own image, you can specify the extent as shown when using the cartopy map, and this will allow you to plot coordinate data directly onto your map image.

Using this new background, the data from the 4th, 9th and 15th of January 2021 can now be plotted. These dates have been selected to show three distinct data examples: one of all increasing cases, one of a mixture of increasing and decreasing cases, and one of all decreasing cases. The code example shown is for the 4th of January; only the data file pointer and the date in the figure title need to be changed for the other dates.

Reviewing two datasets side-by-side, note that the scales differ.

However, after plotting only the first two datasets, there’s an immediate problem with the visualizations: the scales have been generated based on the data, and since the datasets differ, the scales don’t match, preventing us from comparing the data directly. The same glyph, for example, represents an increase of 35.5 and an increase of 20.8 in each of the above visualizations.

4) Adjusting the scales

To solve this issue, vizent allows you to specify many attributes of the scale, so it’s possible to ensure the plots will all have a diverging shape scale, even though some of the data is all positive or all negative. You can also increase the number of different glyph shapes on the scale so that you can see smaller intervals, and set the maximum and minimum values for both colour and shape so that these will be the same for each set of data, allowing you to compare the data more easily.

Same as above, the code example shown here is for the 4th of January, and only the data file and the date in the figure title need to be changed to replicate this for the other days in our dataset. The adjustments to the scale were made by adding the parameters colour_n and shape_n, to control the number of values in each scale, colour_min and colour_max, to control the range of the colour scale, scale_diverges, which ensures positive and negative values are present in the shape scale, and shape_max, which sets the maximum shape value. It is not necessary to set the minimum shape value, as the diverging scale will be symmetrical by default.

The scales have now been adjusted to match, allowing for direct comparison.

Now the same set of glyphs will carry the same meaning across each plot.

5) Removing replicated legends

Finally, since several plots are being created, which will be displayed as a set, you can consider removing the legend from all but one, which can also be easily done by simply setting show_legend=False.

By removing identical legends, you can leave more room for the data. The three plots shown have been combined manually for easier viewing.

Now you can easily compare the data from all three dates, and do so on a cleaner, less busy layout. We can now observe that on the 4th of January, all neighboring regions were experiencing an increase in cases. By the 9th, some regions continued to see an increase while others were decreasing, or staying approximately level. In this case, we can observe that while the London area has the highest cases per capita, it shows a decrease in cases, whereas the North West area shows the highest rate of increase. This could help to more quickly identify which regions are of most concern by contextualizing the directionality of the case count. By the 15th of January, we see that all regions were experiencing a decrease in cases, with the London area showing the most rapid decline.

This walkthrough, of course, was just an illustration of some of the capabilities of visual entropy glyphs using some pertinent current-day data, and not intended to cast any epidemiological or public health judgments. For more examples of data viz using visual entropy glyphs, and how to create them using vizent, see the vizent package and associated documentation on GitHub. We hope you will try using vizent when you next come across some analysis involving multiple derivative data points that need to be communicated all on one glyph..

Lucy McLaughlin is a research software engineer working on data visualization research at Newcastle University, UK.

Acknowledgments:

The Alan Turing Institute for funding the Newcastle Seedcorn project “Automating visualization”, under the EPSRC grant EP/N510129/1 and for Nick Holliman’s Turing Fellowship.

Citations:

[1] “Visual Entropy and the Visualization of Uncertainty”, Holliman et al., arXiv:1907.12879

[2] “Petascale Cloud Supercomputing for Terapixel Visualization of a Digital Twin”, Holliman et al., arXiv:1902.04820

[3] “Designing a Cloud-based 3D Visualization Engine for Smart Cities”, Holliman et al., Electronic Imaging, 2017 (5), 173–178.

Data and licenses:

Public Health England data source:
[4] https://coronavirus.data.gov.uk/details/cases

Public Health England data license:
https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

OSM map license:
https://www.openstreetmap.org/copyright

--

--