Stacked Bar Chart: Data Preparation and Visualization

Becaye Baldé
7 min readFeb 25, 2023

--

A stacked bar chart requires your data to be in a specific format. Use R to prepare and visualize your data.

Stacked bar chart with ggplot2

TL;DR

In this article, we are going to create a stacked bar chart. One challenge in doing so is to have our data formatted properly. But before transforming the data, we need to understand the structure of a stacked bar chart.

Anatomy of the stacked bar chart

3 elements are important in a stacked bar chart:

  • Category: the categories are displayed on the x-axis. For example, the main category can be the most popular social media.
  • Height: the height of the bar represents the values displayed on the y-axis. Likewise, this could be the number of users for each social media.

Now this gives you a bar chart. Just add one element:

  • Subcategory: Each category contains subcategories stacked on top of each other. In the case of our example, gender could be a subcategory, separating the number of each social media user by gender.
Source: EdrawMax

Now let’s create a stacked bar chart.

Load the data

Save the following data as “politics_approval_rates.csv”.

"Issue","Approve","Disapprove","No.Opinion"
"Race relations",52,38,10
"Education",49,40,11
"Terrorism",48,45,7
"Energy policy",47,42,11
"Foreign affairs",44,48,8
"Environment",43,51,6
"Situation in Iraq",41,53,6
"Taxes",41,54,5
"Healthcare policy",40,57,3
"Economy",38,59,3
"Situation in Afghanistan",36,57,7
"Federal budget deficit",31,64,5
"Immigration",29,62,9

Source: FlowingData | Nathan Yau

This data is about participants who were asked whether they approved or disapproved of how a president dealt with 13 issues.

Read the CSV file with R

data <- read.csv('politics_approval_rates.csv')
df <- data.frame(data)
df
Politics approval rates dataset

Now, in order to make a stacked bar chart, we need to identify the 3 elements: the category, subcategory and height.

  1. Category: the issue (Race relations, Education, etc.) will be our main category.
  2. Height: the approval rate for each opinion.
  3. Subcategory: the opinion will be our subcategory. It can be either “approve”, “disapprove” or “no opinion”.

Now, look at the table. Notice how the opinions and approval rates are structured? We don’t want that for a stacked bar chart. We need to rotate these to get something like this:

Final data frame

Each issue has 3 opinions with their respective values.

Now let’s code.

Transform the data

The category — issues

First, we replicate each issue 3 times because we have 3 subcategories (see figure above).

issues <- c()

# loop through each of the 13 issues
for (issue in df$Issue)
# replicate the issue 3 times
issues <- c(issues, rep(issue, 3))

# 13 * 3 = 39 issues
issues
[1] "Race relations"           "Race relations"           "Race relations"          
[4] "Education" "Education" "Education"
[7] "Terrorism" "Terrorism" "Terrorism"
[10] "Energy policy" "Energy policy" "Energy policy"
[13] "Foreign affairs" "Foreign affairs" "Foreign affairs"
[16] "Environment" "Environment" "Environment"
[19] "Situation in Iraq" "Situation in Iraq" "Situation in Iraq"
[22] "Taxes" "Taxes" "Taxes"
[25] "Healthcare policy" "Healthcare policy" "Healthcare policy"
[28] "Economy" "Economy" "Economy"
[31] "Situation in Afghanistan" "Situation in Afghanistan" "Situation in Afghanistan"
[34] "Federal budget deficit" "Federal budget deficit" "Federal budget deficit"
[37] "Immigration" "Immigration" "Immigration"

Let’s understand the code above:

  • the rep(vector, n) function replicates a vector n times.
  • vector <- c(vector, value) means append “value” to the end of the vector.

So, the following line simply means: replicate “issue” 3 times, then append it to “issues”.

issues <- c(issues, rep(issue, 3))

The subcategory — opinions

Next, we create the subcategories. We have 3 opinions: approve, disapprove and no opinion.

opinions <- colnames(df[, 2:4])
opinions
[1] "Approve"    "Disapprove" "No.Opinion"

We want to replicate the subcategories 13 times because there are 13 categories (issues).

Final data frame

Notice that the opinions are replicated in their entirety, whereas the issues were replicated individually.

opinions <- rep(opinions, 13)
opinions
[1] "Approve"    "Disapprove" "No.Opinion" "Approve"    "Disapprove" "No.Opinion" "Approve"   
[8] "Disapprove" "No.Opinion" "Approve" "Disapprove" "No.Opinion" "Approve" "Disapprove"
[15] "No.Opinion" "Approve" "Disapprove" "No.Opinion" "Approve" "Disapprove" "No.Opinion"
[22] "Approve" "Disapprove" "No.Opinion" "Approve" "Disapprove" "No.Opinion" "Approve"
[29] "Disapprove" "No.Opinion" "Approve" "Disapprove" "No.Opinion" "Approve" "Disapprove"
[36] "No.Opinion" "Approve" "Disapprove" "No.Opinion"

The last thing to do is to extract the value for each opinion.

The height — approval rates

We proceed by getting the approval rates, transposing them, then flattening them to a 1D vector.

Get the approval rates

values <- df[2:4]
values
Approve Disapprove No.Opinion
[1,] 52 38 10
[2,] 49 40 11
[3,] 48 45 7
[4,] 47 42 11
[5,] 44 48 8
[6,] 43 51 6
[7,] 41 53 6
[8,] 41 54 5
[9,] 40 57 3
[10,] 38 59 3
[11,] 36 57 7
[12,] 31 64 5
[13,] 29 62 9

Transpose the values

values <- t(values) 
values
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
Approve 52 49 48 47 44 43 41 41 40 38 36 31 29
Disapprove 38 40 45 42 48 51 53 54 57 59 57 64 62
No.Opinion 10 11 7 11 8 6 6 5 3 3 7 5 9

Convert them to a 1D vector

values <- as.vector(values)
values
[1] 52 38 10 49 40 11 48 45  7 47 42 11 44 48  8 43 51  6 41 53  6 41 54  5 40 57  3 38 59  3 36 57  7 31 64  5 29 62  9

We are basically taking the columns one by one and stacking them on top of each other.

Load the data into a data frame

# create the dataframe
final_df = data.frame(issues, opinions, values_vector)

# rename the third column
colnames(final_df)[3] <- "Approval Rates"

final_df
issues   opinions Approval Rates
1 Race relations Approve 52
2 Race relations Disapprove 38
3 Race relations No.Opinion 10
4 Education Approve 49
5 Education Disapprove 40
6 Education No.Opinion 11
7 Terrorism Approve 48
8 Terrorism Disapprove 45
9 Terrorism No.Opinion 7
10 Energy policy Approve 47
11 Energy policy Disapprove 42
12 Energy policy No.Opinion 11
13 Foreign affairs Approve 44
14 Foreign affairs Disapprove 48
15 Foreign affairs No.Opinion 8
16 Environment Approve 43
17 Environment Disapprove 51
18 Environment No.Opinion 6
19 Situation in Iraq Approve 41
20 Situation in Iraq Disapprove 53
21 Situation in Iraq No.Opinion 6
22 Taxes Approve 41
23 Taxes Disapprove 54
24 Taxes No.Opinion 5
25 Healthcare policy Approve 40
26 Healthcare policy Disapprove 57
27 Healthcare policy No.Opinion 3
28 Economy Approve 38
29 Economy Disapprove 59
30 Economy No.Opinion 3
31 Situation in Afghanistan Approve 36
32 Situation in Afghanistan Disapprove 57
33 Situation in Afghanistan No.Opinion 7
34 Federal budget deficit Approve 31
35 Federal budget deficit Disapprove 64
36 Federal budget deficit No.Opinion 5
37 Immigration Approve 29
38 Immigration Disapprove 62
39 Immigration No.Opinion 9
>

Beautiful! Now here comes the fun part, visualization! 🎉

Visualizing the data —The Stacked bar chart

Explanation will follow.

# load the ggplot2 library
library(ggplot2)

ggplot(final_df, aes(fill=opinions, y=`Approval Rates`, x=issues)) +
geom_col(position="stack")
theme(axis.text.x = element_text(angle = 45, margin = margin(t=30, "pt")))
Stacked bar chart with ggplot2

ggplot()

First, we call the ggplot function and pass it 2 arguments:

  • The final data frame
  • aes(): a function to map the columns' names to the arguments.

aes()

We pass 3 arguments to the aes() function:

  • x: the labels on the x-axis.
  • y: the height of the bars
  • fill: the color of the bars

geom_col()

  • position: “stack” to stack the bars. Use “dodge” to group the bars (See figure below).

theme()

We use it to rotate the x labels to make the graph more readable:

  • axis.text.x: rotate the x labels by 45 degrees and a top margin of 30 pt.

Tip: In RStudio, put “?” before a function and execute to display help. Ex: ?aes.
You can also place the cursor on a function and press F1.

Grouped bar chart

ggplot(final_df, aes(x=issues, y=values, fill=opinions)) + 
geom_col(position="dodge") +
theme(axis.text.x = element_text(angle = 45, margin = margin(t=30, "pt")))
Grouped bar chart

In this article, we created a stacked bar chart to understand people’s opinions on a president’s policies. The funny thing is that it’s always the same story: We want to visualize our data instantly but we can’t. Instead, we spend 90% of the time understanding and cleaning the data, but only 10% visualizing it. But hey, in the end, it’s worth it.

References

--

--

Becaye Baldé

Becaye is a Junior Data Scientist with a Master's in AI. He loves discovering new things be it in tech or everyday life :)