Reduce, Reuse, Recycle

Richard Muir
Jul 24, 2017 · 5 min read

Making Data Human #3

In the last post I showed you how to pull a story from a mass of data by deriving percentage changes and indices. But often, calculating summary statistics isn’t enough. We can’t always pick and choose the story. Sometimes we have to show loads of data on the same page.

Here I’m going to expose some techniques you can use to pack your page full of data, to maximise the data-ink ratio and to squeeze every last drop of insight from the space on the page.

Some of these techniques will be reductionist, seeking to summarise or condense a dataset, whilst others will use design principles to clarify, or even highlight the size and scope of the data.

Slope charts

Slope charts are great. They’re simple. No messing about. They answer two questions. How was this thing? And how is it now? A slope chart does away with the intermediate values, instead focussing solely on the magnitude and direction of the change. Is it up or down? Is it big or small?

Slope charts are used to display loads of different categories whilst maintaining a clear and simple chart. Take some data on life expectancy from the WHO for example: 195 countries times 50 years of life expectancy data equals 10,000 rows. That’s a lot of information on the same chart, but by only plotting the first and last year in the series I can reduce this to around 400 data points.

That’s still a lot of data though, so to really distil the message it’s important to implement some of the principles of visual hierarchy I discussed in a previous article.

I love using grey in charts. Not because it’s beautiful, but because it’s neutral. Other colours pop and explode in its presence. See those green lines? The red ones? You can’t help but look!

Slope chart showing the change in life expectancy

A slope chart simplifies everything. It boils all the data down to a simple comparison. The user can easily pick out the risers and the fallers, the winners and the losers.

Line charts

That’s not to say that a slope chart is always superior to a line chart. More data does not necessarily make for more confusion. I can still include all of the data points by applying those same principles of visual hierarchy; using grey to fade out the less interesting cases. The intervening points can be plotted without introducing too much messiness or ambiguity. Other stories may even come to the fore:

Line chart showing the change in life expectancy

It’s hard to ignore those big loops at the bottom for Cambodia, Timor-Leste, Rwanda & Sierra Leone. The life expectancy in these countries dropped dramatically, but they have since recovered and another version of this chart might have these countries highlighted instead.

Are the simple winners and losers still the most interesting countries in this dataset? Probably not.

Small Multiples Chart

Small multiples charts are the illegitimate offspring of a scatterplot matrix and a line chart. Where line charts can be confusing and messy, small multiples clear this right up by plotting each data item on a separate, albeit small chart.

The essence of this chart is one of repetition. I reuse and recycle a single line (or bar or scatter) chart, repeating it for each of the categories. This way, the reader gets to understand the individual situation in each case, but in the context of all the cases as a whole.

This type of chart works best when the special cases are highlighted for the reader. Once again, I’ve highlighted the five countries which had the largest increase, and the only two where the life expectancy fell

Small multiples chart showing the change in life expectancy

This type of chart begs the reader to engage. Where is my country? How does it compare against my neighbours?

There’s a lot of data here, but by using grey to allow me to highlight those countries of interest, we can pull an engaging story from this dataset.

Matrix chart

I’ll move away from visualising life expectancy data, instead using a dataset on the Olympics which Andy Kirk of visualisingdata.com provided for a training course I recently attended.

I decided to investigate which countries have been the most successful during the history of the Games. So, in keeping with the theme of reducing the number of data points shown, I started with a chart much like this:

Matrix chart showing the number of medals won at the Olympics

I like the way that each data point looks like a medal. This makes the chart really thematic. But by displaying the total number of medals as a single data point, I’ve lost a sense of the scale of effort and achievement of the individual athletes

With this in mind I chose to expose the scale of the data in order to highlight the size of the achievements. I plotted each and every medal won by those countries since the (modern) inception of the Olympics:

Matrix chart showing the number of medals won at the Olympics

With a matrix chart you can see the distribution across several different categories. A chart like this acts a bit like a table — you could easily replace the grouping of dots with a single number. You could even argue that doing this would make the chart clearer. But I think that lacks drama.

Heatmaps

Heatmaps work in a similar way to matrix charts, allowing us to read the data in a tabular way. This particular heatmap operates in three dimensions; we can see the number of medals won by each country at each Olympics since 1896:

Heatmap showing the number of medals won by each country at the Olympics

This chart gives a very high level view of the data. You can see the general trends; the ups and downs. You can see the gaps in the Olympics for WWI & WWII. You can see how China has increased their medal haul over the years at the expense of the USA & Russia. You can also see where the USA and several other countries boycotted the 1980 Moscow Olympics.

By zooming out I’ve allowed the reader to more easily understand the scale of the data and see high-level trends that would be invisible at a more granular level.

Reduce, reuse, recycle . . . and review!

I’ve shown you how to reduce a dataset down to just a few points in order to show a simplified message, using grey coupled with some bright colours to highlight this message. I’ve used a small multiples chart to engage the reader and draw them in to a visualisation by hiding complexity behind repetition.

I’ve condensed a haul of 2,500 medals down to three data points, and by doing so, seen that it’s sometimes better to highlight the size and scale of a dataset rather than hiding it. I’ve also demonstrated how to use a heatmap to give a high level view of a large dataset and expose trends and insights which would otherwise be hidden.

So here you go. Here’s six different charts you can use to show stacks of data.

If you like the look of these charts and want to learn how use Plotly then you can sign up for my Udemy course; Data Visualisation with Plotly and Python for only £10 by following this link.

Richard Muir

Written by

Data analysis and visualisation in Python and D3.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade