Enhanced storytelling with data animations using “gganimate”

Deepsha Menghani
Data Science at Microsoft
5 min readJul 19, 2022

--

I love data visualizations because I believe that data is only as good as the story it tells. Data visualizations are a key tool in this process of sharing insights with stakeholders in ways that make impact and are easily understood. To improve my storytelling skills, I decided to learn how to create animated plots in R with the gganimate R package. Through this article I will share the journey I took to:

  • Find a dataset to play with
  • Learn to use the package gganimate
  • Discover and showcase insights with animation
  • Combine geospatial mapping and animation
  • Bring animation to work projects

The dataset

I wanted a dataset with a time component that I could animate as well as that was simple enough that anybody could relate to the use case and play with the insights. I picked the United States census data by state from 1910–2020 shown below because it fulfilled those criteria. It also has data from all states, which enabled me to compare animation across multiple states to derive insights and learn how to plot maps, which was also on my visualization bucket list.

Code

You can find the functions, code, and data to reproduce the plots in this article on GitHub.com at deepshamenghani.

Population density over rank plot

I wanted to plot density versus density ranking for selected states over a period of time, in this case 1910–2020. To keep this example simple, the function below called plot_density_vs_rank takes in the population data and the list of states for which the plot needs to be created and returns a ggplot object as shown below. Note, a higher rank implies higher population density relative to other states in any given year.

The plot above allows us to understand the density versus ranking for California over the 110-year period. Note that growth in population density doesn’t necessarily imply an increase in ranking because the ranking is relative to the density of other states in any year. Creating a plotly graph allows us to hover over each point to get more details such as the year to which each point corresponds. For example, in 1950, California’s population density was 68 with rank 31, putting it ahead of 30 other states. Because time isn’t along either the x- or y-axis, these labels allow us to trace the journey over the third dimension of time.

Next, let’s add a few more states to the plot above.

This is where the visualization becomes a bit difficult to read or derive insights from. The above plot has the same information as before but with the added component of multiple states it is difficult to understand the journey over time and compare the density and ranking evolution of the four states. This is where animation can be very useful to derive insights that a plot with two dimensions doesn’t otherwise allow. I will now use the gganimate package to animate over the third dimension of time. transition_reveal allows us to not only see the points over time but leave the trace behind to show the journey so far.

Here are insights from the plot above that now become easier to showcase:

  • Washington and California both started with similar low density and ranking in 1910, but California grew much faster and higher than Washington over time.
  • California rose in both density and ranking over time until 1960, after which it started to rise in density without much change in ranking, both of which came close to Pennsylvania by 2020.
  • Alabama, while continuing to rise in density, lowered in ranking over time and was topped by California in 1950 and by Washington in 2000.

Plotting population across US states on a map

Next, I wanted to showcase relative change in population among all states on the US map. The following function called population_dataset_lat_long takes in the dataset and returns a ggplot object.

The plot above shows relative population across states for 2020. To compare how the relative population has changed over time you would need to plot another map for 1910 and switch between the two. Instead, animating this map and creating a second gif allows us to see how the relative population has changed between the two time periods. Note that the animation below is just for two census data points, one in 1910 and one in 2020, unlike the previous animation that went through all the census points in that time period. I use transition_states in this scenario.

The plot above makes it easier to showcase the insight of how California and Texas went from having relatively lower populations as compared to some states on the East Coast to being two of the more highly populated states by 2020.

Bringing these animations to other projects using gganimate

There are many scenarios where animation can be extended to help enable taking a step back and telling a broader story, such as:

  • Plotting customer count versus revenue for multiple products and animating over time to compare across product lines.
  • Mapping the relative change in support ticket resolution times across customer service centers.
  • Plotting the relative change in sales cycle timelines pre- and post-marketing campaigns.
  • And a personal one that I am planning to tackle next for myself: Relative change in expense distribution across various buckets of budgets month over month.

An advantage of using gganimate is that with a few lines of code, you can animate any plot over any parameter. It is integrated with ggplot, so you are building on top of it. However, because the rendering is a bit slow, the interactivity can pose a challenge for creating animations on the fly in a live setting — but it can work great in reports that are rendered before sharing.

Animation brings in a key component of analysis where the third parameter you want to animate is not directly visible in your plot. Sometimes it can tell a powerful story and sometimes it can just add some flair to an otherwise low-key story. But it is also important to note that the drama that animations introduce can be chaotic and in some cases overkill. For instance, if I tried to animate the density versus ranking plot for more than four states at a time, it could be very difficult to read. Animation should be used as a tool where it truly makes the insight easier to understand for your stakeholders or in cases where adding a third dimension drives the point across with more impact.

Deepsha Menghani is on LinkedIn.

--

--