Analytics Vidhya
Published in

Analytics Vidhya

Storytelling With Your Graphs In R Using ggplot2

Use text, vertical lines and relevant titles to tell a story of Presidential Approval Ratings in R!

  • Use text — it’s your secret weapon and can be used in more than just the title
  • Consider the Emotional and Political Context — understand how your audience might look at your chart
  • Use animation intelligently — animated GIFs, charts and videos are helpful but should be used to underscore points in your story (note I am planning to do a second blog post specifically on this!)

Step 1: Package & Data Loading

Let’s load the required packages and data in RStudio. I have included all the data on my GitHub repository or you can download it yourself from the Presidency project and FiveThirtyEight for President Trump’s approval ratings. Note I did manually clean some of the excel sheets for ease of use, so downloading from my Github might be easier.

if(!require("readxl")) install.packages("readxl") # Required to read in the data
if(!require("tidyverse")) install.packages("tidyverse") # Our rock in data analysis (includes ggplot2)
if(!require("janitor")) install.packages("janitor") # Cleans up data like no other package
if(!require("ggsci")) install.packages("ggsci") # Provides awesome color palettes
# Used a function found on stackoverflow to combine all the different sheets of an excel file into a list
read_excel_allsheets <- function(filename, tibble = TRUE) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
if(!tibble) x <- lapply(x, as.data.frame)
names(x) <- sheets
x
}
# Combine the different sheets into one list of 13 dataframes
data.list <- read_excel_allsheets("data/PrevPresidentApproval.xlsx")
# Download the separate Trump approval dataset
trump.approval <- read.csv("data/TrumpApproval.csv")

Step 2: Data Manipulation

After loading the packages and the data the next step is data manipulation. For this, we want to label all the datasets, rename the columns and merge the two dataframes (one of the previous presidents and one of President Trump) after ensuring all their columns are the same, as they are from two different sources.

# Create a list with all the president's names
pres.names <- list("Obama", "BushJr", "Clinton", "BushSr", "Reagan", "Carter", "Ford", "Nixon", "Johnson", "Kennedy", "Eisenhower", "Truman", "Roosevelt")
# Apply the list to each dataframe in the original excel list
# This makes up for the sheet names, which originally had the president names
data.list <- Map(cbind, data.list, President = pres.names) # the Map function applies cbind to each dataframe of the list
# The Janitor package helps us clean the names, from which we select all the columns except for the polling start date (taking the end date instead). Then we rename the columns with the rename() function
df <- janitor::clean_names(bind_rows(data.list)) %>%
select(-start_date) %>%
rename(date = end_date, approval = approving, disapproval = disapproving, unsure = unsure_no_data)
df$date <- as.Date.POSIXct(df$date) # We need to change the value from POSIXct to Date
# Now let's clean the trump dataset to match the others and combine it into a new dataframe
# I always create new dataframes in case I want to re-access the earlier data without loading it all in again
trump.approval <- read.csv("data/TrumpApproval.csv")
trump.approval <- trump.approval %>%
filter(subgroup=="Adults") %>% # I chose to take the all adults category as it is more representative of the country
select(modeldate, approve_estimate, disapprove_estimate) %>%
mutate(unsure =(100 - (approve_estimate + disapprove_estimate))) %>% # Create an unsure column
rename(date = modeldate, approval = approve_estimate, disapproval = disapprove_estimate) %>% # rename the other columns
mutate(president="Trump")
trump.approval$date <- as.Date.character(trump.approval$date,"%m/%d/%Y") # Change the date column from character to date format
df2 <- rbind(df, trump.approval) # Combine the data into a new dataframe df2

Step 3: Additional Data Requirements

Before graphing, we need to add in one more detail to make sure we are telling the proper story. This detail is the date each president starts their term. This allows us to create a continuous graph of presidential approval rates with vertical lines at each term start. Be careful of the details though, as not all presidents started on their allotted January 20th date!

# To properly graph these presidents together we need to create a separate vector with the term dates for each president
# To do this we group by the president, arrange the data by the date of the polling and use the slice function to cut off the first polling entry, which is likely in their first year of presidency. Then select the two columns we need (president & date)
term.dates <- df2 %>%
group_by(president) %>%
arrange(date) %>%
slice(1) %>%
select(president, date)
# Every president starts on January 20th, so grab the year of their first poll and change the term.date start to January 20th
term.dates$term.start <- paste0(substring(term.dates$date,1,4), "-01-20")
# But...note the three exceptions to this rule:
# Gerald Ford took over the August 9th, Truman on April 12th, and Johnson on November 22nd after Kennedy was assassinated
term.dates[6,3] <- "1974-08-09"
term.dates[7,3] <- "1963-11-22"
term.dates[13,3] <- "1945-04-12"
term.dates <- term.dates[,-2] # Get rid of the date column
df2 <- merge(df2, term.dates, by = "president") # Merge the term.start into the main dataframe using the merge function
df2$term.start <- as.Date.character(df2$term.start) # Turn the term.start into the date class
df2$days_in_office <- df2$date - df2$term.start # Calculate the number of days in office, which will be relevant for later work!

Step 4: Graphing The Combined Presidential Story

The first plot I will make is a combined plot of approval ratings for each president. To make it easy to read and interpret I add vertical lines at the start of each of their terms using the term.start column. I also use the annotate() function to include labels with each of President’s name at the top of the plot. In my opinion, this looks a lot better than a cluttered legend for this specific graphic. Finally, the subtitle and title are required to tell the story of the graph, so don’t forget to put some thought into those elements!

# For colors I use the simpsons palette from the ggsci package as you need a lot of colours for 14 different presidents!
my_colors <- pal_simpsons("springfield")(16)
theme_set(theme_bw()) # Set the overall graphing theme; bw is my favourite as it makes it easy to compare and has a blank background
combined.plot <- df2 %>%
ggplot(aes(x = date, y = approval, color = as.factor(president))) +
scale_color_simpsons() +
geom_point(alpha=0.7) +
geom_smooth(span = 1, alpha = 0.5) + # Adds a smoothed line to the graph, much more visually appealing than a standard line
geom_vline(data = df2, aes(xintercept = term.start), linetype= 4, color = "black", size=0.5) + # match the xintercept with the term.start dates found in the dataframe you are using (df2)
scale_x_date(limits = as.Date(c("1941-01-20","2021-01-20"))) +
annotate(geom="text", x=as.Date.character(c("2004-6-01", "1990-6-01", "1979-6-01", "1996-6-01", "1958-6-01", "1975-6-01", "1966-6-01", "1962-6-01", "1970-6-01", "2012-6-01", "1984-6-01", "1942-6-01", "1948-6-01", "2019-6-01")), y=c(100), # The x and y depict where you want the annotations to be
label=c('bold("Bush Jr.")', 'bold("Bush Sr.")', 'bold("Carter")', 'bold("Clinton")', 'bold("Eisenhower")', 'bold("Ford")', 'bold("Johnson")', 'bold("Kennedy")', 'bold("Nixon")', 'bold("Obama")', 'bold("Reagan")', 'bold("Roosevelt")', 'bold("Truman")', 'bold("Trump")'), angle = 45, # Angling the labels for effect
color=my_colors[1:14], parse = TRUE) + # Using the my_colors vector we can match the lines and text annotations with the same color
# Add in the labels and titles! use the \n to have the subtitle spill over into the next line
labs(x = "Date",
y = "Approval Rating",
title = "Approval Rating By President over the Years",
subtitle = "Presidential approval ratings have gone from a volatile rollercoaster like in Truman, Nixon \nor the tenures of both Bush's to a more stable, party-centric evaluation \nepitomized by Trump's term",
color = "President") +
theme(plot.title = element_text(face="bold", size =14),
axis.title.x = element_text(face="bold", size = 12),
axis.title.y = element_text(face="bold", size = 12),
legend.position = "none")
combined.plot
Figure 1 — A combined graph showing the approval rating of every president in the past 75 years
  • How to use annotate() to replace a legend, placing the president names right above their stats
  • Match colors between the text and the graphed lines/ points in the plot to keep your reader in tune
  • Tell a story using the subtitle, especially if the title is a boring description of the graph

Step 5: Graphing Individual Presidential Approval Ratings

If you looked at my last post, you might notice individual presidential approval rating plots that include descriptors about major events in their presidency. This goes back to the first two points about storytelling: 1) Use text, and 2) Know your Audience. These graphs do wonders explaining why approval ratings rise and fall, without any additional captions, explanations or anything, all you need is the visual!

nixon.plot <- df2 %>% 
filter(president == "Nixon") %>%
ggplot(aes(x = date, y = approval, color = "green")) +
geom_point(alpha=0.7, color = "green") +
geom_smooth(span = 0.5, alpha = 0.5, color = "green") +
geom_vline(xintercept = as.numeric(as.Date(c("1969-6-8", "1972-11-7", "1972-6-17", "1974-08-08", "1973-4-30"))), linetype= 4, color = "black", size=0.5) +
labs(x = "Date",
y = "Approval Rating",
title = "Richard Nixon's Approval Rating over the Years",
subtitle = "Nixon was a relatively popular president as he withdrew troops from Vietnam and was even able \nto get re-elected. As soon as he took responsibility for the Watergate though, there \ncontinued to be a steep decline in approval ratings") +
annotate(geom="text", x=as.Date.character(c("1969-08-8", "1972-11-7", "1972-1-17", "1974-03-08", "1973-4-30")), y=c(40, 47, 40, 40, 65),
label=c('atop(bold("Announces Plan to"), bold("Leave Vietnam"))', 'bold("Wins Re-Election")', 'atop(bold("DNC Headquarters Is"), bold("Broken Into"))', 'atop(bold("Nixon Officially"), bold("Resigns"))', 'atop(bold("Nixon Takes Responsibility"), bold("For Watergate"))'), color="black", parse = TRUE) +
theme(plot.title = element_text(face="bold", size =14),
axis.title.x = element_text(face="bold", size = 12),
axis.title.y = element_text(face="bold", size = 12),
legend.position = "none")
nixon.plot
Figure 2 — Nixon’s approval rating was fairly stable until the watergate scandal finally took hold
bushjr.plot <- df2 %>% 
filter(president == "BushJr") %>%
ggplot(aes(x = date, y = approval, color = "red")) +
geom_point(alpha=0.7, color = "red") +
geom_smooth(span = 0.5, alpha = 0.5, color = "red") +
geom_vline(xintercept = as.numeric(as.Date(c("2001-9-11", "2003-3-20", "2006-11-07", "2007-12-01"))), linetype= 4, color = "black", size=0.5) +
labs(x = "Date",
y = "Approval Rating",
title = "George W. Bush's Approval Rating over the Years",
subtitle = "Bush Jr. had an ever decreasing approval rating broken up by two jumps: \n1) Right after Sept-11th and 2) When he declared war on Iraq",
color = "President") +
annotate(geom="text", x=as.Date.character(c("2001-9-11", "2003-3-20", "2006-11-07", "2007-12-01")), y=c(40, 80, 60, 50),
label=c('bold("Sept 11th Attack")', 'atop(bold("Iraq War"), bold("Declaration"))', 'atop(bold("2006 Mid-Term"), bold("Elections"))', 'atop(bold("Great Recession"), bold("Begins"))'), color="black", parse = TRUE) +
theme(plot.title = element_text(face="bold", size =14),
axis.title.x = element_text(face="bold", size = 12),
axis.title.y = element_text(face="bold", size = 12),
legend.position = "none")
bushjr.plot
Figure 3 — Bush’s tenure is marked by the 9/11 attacks and the Iraq war, both which saw spikes in his approval rating
obama.plot <- df2 %>% 
filter(president == "Obama") %>%
ggplot(aes(x = date, y = approval, color = "blue")) +
geom_point(alpha=0.7, color = "blue") +
geom_smooth(span = 0.5, alpha = 0.5, color = "blue") +
geom_vline(xintercept = as.numeric(as.Date(c("2010-3-20", "2011-5-2", "2012-11-07", "2015-03-2"))), linetype= 4, color = "black", size=0.5) +
labs(x = "Date",
y = "Approval Rating",
title = "Barack Obama's Approval Rating over the Years",
subtitle = "Obama's popularity peaked both times he was elected, showing his strong ability to campaign. \nHe also got a spike when Bin Laden was killed in 2011") +
annotate(geom="text", x=as.Date.character(c("2010-3-20", "2011-5-2", "2012-11-07", "2015-03-2")), y=c(70, 60, 65, 55),
label=c('bold("Passes ObamaCare")', 'atop(bold("Osama Bin Laden"), bold("Assasinated"))','bold("Wins Re-Election")', 'atop(bold("Clintons Email"), bold("Scandal"))'),
color="black", parse = TRUE) +
theme(plot.title = element_text(face="bold", size =14),
axis.title.x = element_text(face="bold", size = 12),
axis.title.y = element_text(face="bold", size = 12),
legend.position = "none")
obama.plot
Figure 4 — Obama rode his two campaigns to the top of the approval rating, and interestingly saw a rise near the end of his presidency
# Now time for Trumptrump.plot <- df2 %>% 
filter(president == "Trump") %>%
ggplot(aes(x = date, y = approval, color = "orange")) +
geom_point(alpha=0.7, color = "orange") +
geom_smooth(method = 'loess', span = 0.5, alpha = 0.5, color = "orange") +
geom_vline(xintercept = as.numeric(as.Date(c("2017-12-22", "2018-11-6", "2019-12-18", "2020-03-25", "2021-01-6"))), linetype= 4, color = "black", size=0.5) +
ylim(32, 50) +
labs(x = "Date",
y = "Approval Rating",
title = "Donald Trump's Approval Rating over the Years",
subtitle = "Trump's highest approval came at the beginning of the pandemic, and the lowest as he \nincited the 2021 capital riots. With a 12 point gap between the two, his approval rating \nwas fairly stable no matter what happened") +
annotate(geom="text", x=as.Date.character(c("2017-12-22", "2018-11-6", "2019-12-18", "2020-03-25", "2020-11-20")), y=c(48, 45, 37, 47, 45),
label=c('atop(bold("Trump Tax Cuts"), bold("Signed Into Law"))', 'atop(bold("Democrats Win Back"), bold("The House"))', 'atop(bold("First Impeachment"), bold("Begins"))', 'atop(bold("COVID Pandemic"), bold("Arrives"))', 'atop(bold("Storming Of"), bold("The Capital"))'),
color="black", parse = TRUE) +
theme(plot.title = element_text(face="bold", size =14),
axis.title.x = element_text(face="bold", size = 12),
axis.title.y = element_text(face="bold", size = 12),
legend.position = "none")
trump.plot
Figure 5 — Trump’s approval is the most stable of all presidents. The graph only shows approval between the 30–50% range, demonstrating the lack of volatility

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dylan Anderson

Bridging the gap between Data and Strategy | Data Strategy Lead at Redkite, code in R & help you use data smarter | Follow on LinkedIn: https://bit.ly/3IPQCeQ