Animating Visualizations in Python with MatPlotLib and ImageIO

How do we represent time when we visualize data? Because our audiences have built-in notions about the flow of time, we have many more expressive options for displaying data that changes over time than we would for another quantitative dimension. Line charts are one example of a visualization that is very powerful for communicating change over time, but what if we incorporate change over time itself into our visualization?

Matplotlib in Python gives us great flexibility in creating charts and the image data manipulation package ImageIO allows us to bring these charts to life by compiling them into an animated GIF.

First we’ll need some help:

Next we’ll need some data. I’ve chosen to use GDP (in 2019 US$) and population data from the World Bank. Note that the main World Bank data files have four title rows above the header, but the metadata files that accompany them do not. This data is arranged with each country or category in a row, and the country/category name, code and data source in the first columns. The remainder of the columns are indexed by year and give the GDP (or population) statistic for that entity in that year.

Once we’ve loaded the data, we want to make add a useful categorization of the countries that will make our data more legible to viewers. The metadata files contain the region for each country, so we’ll create a dataframe of the regions, and then associate a color with each region:

In this particular dataset, the World Bank has included region and category aggregates. While these are useful information, they will ultimately make our visualization more confusing, so we will remove them:

And now we get to the meat of our visualization. We have a three dataframes: GDP, population and region color. My first inclination was to create a scatter plot (the code is still listed in the comments), but the result was difficult to interpret. Instead I ended up iterating through each country in the dataset and placed the country’s three letter code on the plot in a position relative to it’s GDP per capita and population using the color for that country’s region.

I made a couple of important formatting decisions: first, to set the x-axis (population) maximum to 100,000,000 and the y-axis (GDP per capita) maximum to 150,000. This excludes a handful of outlier countries (e.g. China, India), but makes the remaining smaller countries much more visible. Second, I formatted the axis ticks to integers with commas separating the thousands and millions. This is not supported in the simpler string formatting for axes and needs to be done via a function.

The resulting plots are written out to files (filenames indexed by year), and we remember to close our plots before we continue iterating!

The last piece of code pulls the files together into a final product, and this is where methods from the ImageIO package come into use. We create a sorted list of the filenames we created in the previous step and iterate over them with the imread() method to read them in and the get_writer() method to write the resulting set of images out to a file. This get_writer() method lets us create a new gif image and have a high degree of control over the result (for example we are setting the framerate of the GIF with the ‘duration’ argument).

And here is the result!


… And they’re off!

So what can we learn? A few things stick out to me after looking at the motion of countries across the years that would not be immediately visible from looking at cross sectional charts one-by-one. First, we see a lot of perpendicular movement, that is there is a significant number of countries increasing in GDP per capita without increasing in population and another large group of countries increasing in population without increasing in GDP per capita. Second, we can see correlated cyclical movement among neighboring countries, such as France (FRA), Italy (ITA), Germany (DEU) and the UK (GBR).

While there is a lot of room for improvement in this chart, it is visually engaging and allows us to highlight aspects of the data that would be difficult with other techniques

Github Repository