Stacked Bar Chart: Data Preparation and Visualization

7 min readFeb 25, 2023

A stacked bar chart requires your data to be in a specific format. Use R to prepare and visualize your data.

TL;DR

Find the code here.

In this article, we are going to create a stacked bar chart. One challenge in doing so is to have our data formatted properly. But before transforming the data, we need to understand the structure of a stacked bar chart.

Anatomy of the stacked bar chart

3 elements are important in a stacked bar chart:

Category: the categories are displayed on the x-axis. For example, the main category can be the most popular social media.
Height: the height of the bar represents the values displayed on the y-axis. Likewise, this could be the number of users for each social media.

Now this gives you a bar chart. Just add one element:

Subcategory: Each category contains subcategories stacked on top of each other. In the case of our example, gender could be a subcategory, separating the number of each social media user by gender.

Now let’s create a stacked bar chart.

Load the data

Save the following data as “politics_approval_rates.csv”.

"Issue","Approve","Disapprove","No.Opinion"
"Race relations",52,38,10
"Education",49,40,11
"Terrorism",48,45,7
"Energy policy",47,42,11
"Foreign affairs",44,48,8
"Environment",43,51,6
"Situation in Iraq",41,53,6
"Taxes",41,54,5
"Healthcare policy",40,57,3
"Economy",38,59,3
"Situation in Afghanistan",36,57,7
"Federal budget deficit",31,64,5
"Immigration",29,62,9

Source: FlowingData | Nathan Yau

This data is about participants who were asked whether they approved or disapproved of how a president dealt with 13 issues.

Read the CSV file with R

data <- read.csv('politics_approval_rates.csv')
df <- data.frame(data)
df

Now, in order to make a stacked bar chart, we need to identify the 3 elements: the category, subcategory and height.

Category: the issue (Race relations, Education, etc.) will be our main category.
Height: the approval rate for each opinion.
Subcategory: the opinion will be our subcategory. It can be either “approve”, “disapprove” or “no opinion”.

Now, look at the table. Notice how the opinions and approval rates are structured? We don’t want that for a stacked bar chart. We need to rotate these to get something like this:

Each issue has 3 opinions with their respective values.

Now let’s code.

Transform the data

The category — issues

First, we replicate each issue 3 times because we have 3 subcategories (see figure above).

issues <- c()

# loop through each of the 13 issues
for (issue in df$Issue)
  # replicate the issue 3 times 
  issues <- c(issues, rep(issue, 3))

# 13 * 3 = 39 issues
issues

[1] "Race relations"           "Race relations"           "Race relations"          
 [4] "Education"                "Education"                "Education"               
 [7] "Terrorism"                "Terrorism"                "Terrorism"               
[10] "Energy policy"            "Energy policy"            "Energy policy"           
[13] "Foreign affairs"          "Foreign affairs"          "Foreign affairs"         
[16] "Environment"              "Environment"              "Environment"             
[19] "Situation in Iraq"        "Situation in Iraq"        "Situation in Iraq"       
[22] "Taxes"                    "Taxes"                    "Taxes"                   
[25] "Healthcare policy"        "Healthcare policy"        "Healthcare policy"       
[28] "Economy"                  "Economy"                  "Economy"                 
[31] "Situation in Afghanistan" "Situation in Afghanistan" "Situation in Afghanistan"
[34] "Federal budget deficit"   "Federal budget deficit"   "Federal budget deficit"  
[37] "Immigration"              "Immigration"              "Immigration"

Let’s understand the code above:

the rep(vector, n) function replicates a vector n times.
vector <- c(vector, value) means append “value” to the end of the vector.

So, the following line simply means: replicate “issue” 3 times, then append it to “issues”.

issues <- c(issues, rep(issue, 3))

The subcategory — opinions

Next, we create the subcategories. We have 3 opinions: approve, disapprove and no opinion.

opinions <- colnames(df[, 2:4])
opinions

[1] "Approve"    "Disapprove" "No.Opinion"

We want to replicate the subcategories 13 times because there are 13 categories (issues).

Notice that the opinions are replicated in their entirety, whereas the issues were replicated individually.

opinions <- rep(opinions, 13)
opinions

[1] "Approve"    "Disapprove" "No.Opinion" "Approve"    "Disapprove" "No.Opinion" "Approve"   
 [8] "Disapprove" "No.Opinion" "Approve"    "Disapprove" "No.Opinion" "Approve"    "Disapprove"
[15] "No.Opinion" "Approve"    "Disapprove" "No.Opinion" "Approve"    "Disapprove" "No.Opinion"
[22] "Approve"    "Disapprove" "No.Opinion" "Approve"    "Disapprove" "No.Opinion" "Approve"   
[29] "Disapprove" "No.Opinion" "Approve"    "Disapprove" "No.Opinion" "Approve"    "Disapprove"
[36] "No.Opinion" "Approve"    "Disapprove" "No.Opinion"

The last thing to do is to extract the value for each opinion.

The height — approval rates

We proceed by getting the approval rates, transposing them, then flattening them to a 1D vector.

Get the approval rates

values <- df[2:4]
values

Approve Disapprove No.Opinion
 [1,]      52         38         10
 [2,]      49         40         11
 [3,]      48         45          7
 [4,]      47         42         11
 [5,]      44         48          8
 [6,]      43         51          6
 [7,]      41         53          6
 [8,]      41         54          5
 [9,]      40         57          3
[10,]      38         59          3
[11,]      36         57          7
[12,]      31         64          5
[13,]      29         62          9

Transpose the values

values <- t(values) 
values

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
Approve      52   49   48   47   44   43   41   41   40    38    36    31    29
Disapprove   38   40   45   42   48   51   53   54   57    59    57    64    62
No.Opinion   10   11    7   11    8    6    6    5    3     3     7     5     9

Convert them to a 1D vector

values <- as.vector(values)
values

[1] 52 38 10 49 40 11 48 45  7 47 42 11 44 48  8 43 51  6 41 53  6 41 54  5 40 57  3 38 59  3 36 57  7 31 64  5 29 62  9

We are basically taking the columns one by one and stacking them on top of each other.

Load the data into a data frame

# create the dataframe
final_df = data.frame(issues, opinions, values_vector)

# rename the third column
colnames(final_df)[3]  <- "Approval Rates"

final_df

issues   opinions Approval Rates
1            Race relations    Approve            52
2            Race relations Disapprove            38
3            Race relations No.Opinion            10
4                 Education    Approve            49
5                 Education Disapprove            40
6                 Education No.Opinion            11
7                 Terrorism    Approve            48
8                 Terrorism Disapprove            45
9                 Terrorism No.Opinion             7
10            Energy policy    Approve            47
11            Energy policy Disapprove            42
12            Energy policy No.Opinion            11
13          Foreign affairs    Approve            44
14          Foreign affairs Disapprove            48
15          Foreign affairs No.Opinion             8
16              Environment    Approve            43
17              Environment Disapprove            51
18              Environment No.Opinion             6
19        Situation in Iraq    Approve            41
20        Situation in Iraq Disapprove            53
21        Situation in Iraq No.Opinion             6
22                    Taxes    Approve            41
23                    Taxes Disapprove            54
24                    Taxes No.Opinion             5
25        Healthcare policy    Approve            40
26        Healthcare policy Disapprove            57
27        Healthcare policy No.Opinion             3
28                  Economy    Approve            38
29                  Economy Disapprove            59
30                  Economy No.Opinion             3
31 Situation in Afghanistan    Approve            36
32 Situation in Afghanistan Disapprove            57
33 Situation in Afghanistan No.Opinion             7
34   Federal budget deficit    Approve            31
35   Federal budget deficit Disapprove            64
36   Federal budget deficit No.Opinion             5
37              Immigration    Approve            29
38              Immigration Disapprove            62
39              Immigration No.Opinion             9
>

Beautiful! Now here comes the fun part, visualization! 🎉

Visualizing the data —The Stacked bar chart

Explanation will follow.

# load the ggplot2 library
library(ggplot2)

ggplot(final_df, aes(fill=opinions, y=`Approval Rates`, x=issues)) + 
  geom_col(position="stack")
  theme(axis.text.x = element_text(angle = 45, margin = margin(t=30, "pt")))

ggplot()

First, we call the ggplot function and pass it 2 arguments:

The final data frame
aes(): a function to map the columns' names to the arguments.

aes()

We pass 3 arguments to the aes() function:

x: the labels on the x-axis.
y: the height of the bars
fill: the color of the bars

geom_col()

position: “stack” to stack the bars. Use “dodge” to group the bars (See figure below).

theme()

We use it to rotate the x labels to make the graph more readable:

axis.text.x: rotate the x labels by 45 degrees and a top margin of 30 pt.

Tip: In RStudio, put “?” before a function and execute to display help. Ex: ?aes.
You can also place the cursor on a function and press F1.

Grouped bar chart

ggplot(final_df, aes(x=issues, y=values, fill=opinions)) + 
  geom_col(position="dodge") + 
  theme(axis.text.x = element_text(angle = 45, margin = margin(t=30, "pt")))

In this article, we created a stacked bar chart to understand people’s opinions on a president’s policies. The funny thing is that it’s always the same story: We want to visualize our data instantly but we can’t. Instead, we spend 90% of the time understanding and cleaning the data, but only 10% visualizing it. But hey, in the end, it’s worth it.

References

https://flowingdata.com/visualize-this/