Creating racing bar charts in R with gganimate

Aditya Dahiya
7 min readOct 18, 2023

--

We’re about to embark on a thrilling journey through the world of animated racing bar charts in R - dynamic, action-packed data visualization that showcases the ebb and flow of information over time.

Inspired by the ingenious work of Deepsha Meghnani’s article on TidyTuesday and drawing creative insights from the brilliant minds at datacornering.com, we’ll be crafting our very own data-driven racing bar chart masterpiece.

Our canvas is the nycflights13 dataset, with details on flights departing from New York City's three iconic airports, courtesy of various carriers, all throughout the year 2013.

But that’s not all. We won’t stop at just displaying the numbers. We’ll also throw in some flair by illustrating the average delays associated with each of these carriers, injecting a dose of character into the aviation landscape of the Big Apple.

We’re going to unravel the secrets of creating animated racing bar charts using the formidable ggplot2 and gganimate packages in R. Let’s begin with a step-by-step guide.

library(tidyverse)          # Loading Tidyverse for data wrangling
library(gt) # Loading gt package for beautiful tables
library(gganimate) # For animations
library(nycflights13) # for the flights data-set
library(lubridate) # to handle dates in tidyverse

Some code to get the data and tidy it up: —

# Loading the flights dataset
data("flights")

# Pick out the top nine airline carriers only, to avoid crowding the
# upcoming animated plot
carriers_to_plot <- flights |>

# Count the number of flights for each carrier and sort them in descending order
count(carrier, sort = TRUE) |>

# Select the top 9 carriers based on flight count
slice_head(n = 9) |>

# Extract the 'carrier' column from the result
pull(carrier)

df <- flights |>

# Filter the flights dataset to include only the top 9 carriers
filter(carrier %in% carriers_to_plot) |>

# Create a new 'date' column by combining year, month, and day
# This allows us to make a single date variable, that nicely evolves
# over time in an animated plot
mutate(date = make_date(year = year, month = month, day = day)) |>

# Select only the 'date' and 'carrier' columns
select(date, carrier) |>

# Joining the full names of airlines for the annotations in animated plot
left_join(nycflights13::airlines, by = join_by(carrier)) |>

# Remove the 'carrier' column after joining
select(-carrier) |>

# Rename the 'name' column to 'carrier'
rename(carrier = name) |>

# Count the number of flights for each date and carrier combination
count(date, carrier)

Example 1

The visualization below captures the total number of flights operated by each carrier each month, spanning the entire year from January to December 2013. This is an animated bar chart, evolving over time, rather than a truly “racing” bar chart.

gganim <- df |>

# Create two new columns, 'month' and 'month_anim'
mutate(month = month(date, label = TRUE, abbr = FALSE),
month_anim = month(date)) |>

# Group the data by 'month' and 'month_anim', and count the number
# of flights for each 'carrier'
group_by(month, month_anim) |>
count(carrier, wt = n) |>

# Calculate the rank of each 'carrier' based on the flight count
mutate(rank_car = rank(n)) |>

# Remove grouping information
ungroup() |>

# Create a ggplot object with specific aesthetics for rectangles
ggplot(aes(xmin = 0,
xmax = n,
y = rank_car,
ymin = rank_car - 0.45,
ymax = rank_car + 0.45,
fill = carrier,
label = round(n, 0))) +

# Add filled rectangles with transparency
geom_rect(alpha = 0.5) +

# Add text labels for flight counts
geom_text(aes(x = n, label = as.character(n)), hjust = "left") +

# Add text labels for carriers
geom_text(aes(x = 0, label = carrier), hjust = "left") +

# Adjust the x-axis scale limits
scale_x_continuous(limits = c(0, 5500)) +

# Customize labels and titles
labs(x = NULL, y = NULL, title = "Number of flights each month") +

# Add a label indicating the month
geom_label(aes(label = month),
x = 4500, y = 1,
fill = "white", col = "black",
size = 10,
label.padding = unit(0.5, "lines")) +

# Apply a classic theme
theme_classic() +

# Customize plot appearance
theme(legend.position = "none",
axis.line.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.line.x = element_blank(),
title = element_text(size = 20, hjust = 0.5)) +

# Create multiple subplots for each month
facet_wrap(~month_anim) +

# Remove facet labels: this allows us to superlay the facets on top of each other
# and then animate the facets
facet_null() +

# Create a time-based animation based on 'month_anim'
transition_time(month_anim)

# Animate the ggplot object with specified settings
animate(gganim,
duration = 30,
fps = 10,
width = 800,
height = 500,
start_pause = 10,
end_pause = 20)

Example 2

The visualization below offers a unique perspective, showcasing the cumulative total of flights operated by each carrier from January to December 2013, steadily building the story month by month. A truly “racing” bar chart.

df1 <- df |>

# Group the data by 'carrier'
group_by(carrier) |>

# Calculate the cumulative sum of 'n' within each carrier group
# This allows us to ahve a cumulative number of flights over time in a
# truly "racing" bar cahrt over time
mutate(cum_n = cumsum(n)) |>

# Remove grouping information
ungroup() |>

# Group the data by 'date'
group_by(date) |>

# Calculate the rank of 'cum_n' within each date group
mutate(day_rank = rank(cum_n, ties.method = "first"))

gganim <- df1 |>

# Create a ggplot object with specific aesthetics for reactangles
ggplot(aes(xmin = 0,
xmax = cum_n,
y = day_rank,
ymin = day_rank - 0.45,
ymax = day_rank + 0.45,
fill = carrier,
label = cum_n)) +

# Add filled rectangles with transparency
geom_rect(alpha = 0.5) +

# Add text labels for cumulative flight counts
geom_text(aes(x = cum_n, label = as.character(cum_n)), hjust = "left") +

# Add text labels for carriers
geom_text(aes(x = 0, label = carrier), hjust = "left") +

# Customize labels and titles. Adding {closest_state} adds the transition
# variable value to the plot title
labs(x = NULL, y = NULL, title = "Number of total flights operated up to {closest_state}") +

# Apply a classic theme
theme_classic() +

# Customize plot appearance
theme(legend.position = "none",
axis.line.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.line.x = element_blank(),
title = element_text(size = 20, hjust = 0.5)) +

# Create multiple subplots for each date
facet_wrap(~ date) +

# Remove facet labels
facet_null() +

# Transition the plot by 'date'
transition_states(date) +

# Follow the view with a fixed x-axis
view_follow(fixed_x = FALSE)

# Animate the ggplot object with specified settings
animate(gganim,
duration = 40,
fps = 6,
width = 800,
height = 500,
start_pause = 10,
end_pause = 20)

Example 3

In the final visualization below, we delve into the average flights’ arrival delay (in minutes) for each carrier, every month, over the course of the year. What makes this data dance even more exciting is how it ranks carriers from the highest delay to the lowest delay, and as we traverse the months, watch as these rankings twirl and pirouette.

gganim2 <- flights |>

# Filter the flights dataset to include only the top 9 carriers
filter(carrier %in% carriers_to_plot) |>

# Create new columns: 'date' by combining year, month, and day, and
# 'month' to represent the month as a label
mutate(date = make_date(year = year, month = month, day = day),
month = month(date, label = TRUE, abbr = FALSE)) |>

# Select specific columns for the subsequent analysis
select(date, month, carrier, arr_delay) |>

# Group the data by 'month' and 'carrier', and calculate the average arrival delay
group_by(month, carrier) |>
summarize(
avg_delay = mean(arr_delay, na.rm = TRUE)
) |>

# Join the full names of airlines for the annotations in the animated plot
left_join(nycflights13::airlines, by = join_by(carrier)) |>

# Remove the 'carrier' column after joining and rename 'name' to 'carrier'
select(-carrier) |>
rename(carrier = name) |>

# Calculate the rank of average delay, considering ties
mutate(delay_rank = rank(avg_delay, ties.method = "first")) |>

# Create a ggplot object with specific aesthetics for the rectangles
ggplot(aes(xmin = 0,
xmax = avg_delay,
y = delay_rank,
ymin = delay_rank - 0.45,
ymax = delay_rank + 0.45,
fill = carrier
)
) +

# Add filled rectangles with transparency
geom_rect(alpha = 0.5) +

# Add text labels for average delay values
geom_text(aes(x = avg_delay,
label = as.character(round(avg_delay, 1))),
hjust = "left") +

# Add text labels for carriers
geom_text(aes(x = 0, label = carrier), hjust = "left") +

# Add a label indicating the month
geom_label(aes(label = month),
x = 4500, y = 1,
fill = "white", col = "black",
size = 10,
label.padding = unit(0.5, "lines")) +

# Customize labels and titles
labs(x = NULL, y = NULL,
title = "Average flight arrival delay (in minutes) during {closest_state}") +

# Apply a classic theme
theme_classic() +

# Customize plot appearance
theme(legend.position = "none",
axis.line.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.line.x = element_blank(),
title = element_text(size = 20, hjust = 0.5)) +

# Create multiple subplots for each month
facet_wrap(~ month) +

# Remove facet labels
facet_null() +

# Transition the plot by 'month'
transition_states(month)

# Animate the ggplot object with specified settings
animate(gganim2,
duration = 40,
fps = 10,
width = 800,
height = 500,
start_pause = 10,
end_pause = 20)

Notice that in some bad weather months (like June, July and December), almost every airline has considerable delays. On the contrary, if you like being on time, the best months to fly seem to be September to November.

For more such examples, and my projects, visit my site.

--

--

Aditya Dahiya

Harvard MPH, AIIMS and IAS. Passionate for data visualization & health-financing. Uncovering insights through data. (Facts! Not Opinions!)