Chart 3: Proportion of males and females in the USA (2010), by age group

The proportion of females and males in the United States (2010), by age group

In order to become more proficient in R, I’ve been redoing some of the simple statistical analyses that I learned in school and reproducing charts that I find in magazine articles. For the third in my chart series, I’ve chosen a stacked bar chart that shows the proportion of females and males in the United States, by age group.

The chart above is a reproduction. The original, by Randal Olson, can be found (along with several others) in the December 2015 issue of Significance magazine. Dr Olson points out that there are often several ways to present the same information graphically. The best way, he says, depends upon the point you’re trying to make or the features of the data you’re trying to emphasize. This chart makes it very clear that as the population ages, the proportion of males declines precipitously. It tells a story before you even have a chance to read the caption. I love this!

Tech Notes

The chart was prepared entirely in R; no Photoshop or other post-processing was applied. The code appears below.

# Load the data frame
d <- read.csv(file="pop_age_sex.csv", head=TRUE, sep=",")
# Calculate the percentages of males and females:
d <- within(d, pctm <- popm / ( popm + popf ) )
d <- within(d, pctf <- popf / ( popm + popf ) )
# should be all zeroes:
x <- with(d, pctm + pctf)
require(reshape2)
d.melted <- melt(d)
library(ggplot2)
library(showtext) # enables the use of OTF fonts
showtext.auto(enable=TRUE)
font.add("Myriad Pro Regular","MyriadPro-Regular.otf")
facets <- c("pctf","pctm")
myPalette <- c("pctf"="#ED1D27","pctm"="#1B88B4")
ggplot(
data=d.melted[d.melted$variable %in% facets,],
aes(x=agegroup,y=value,fill=variable)
) +
geom_bar(stat="identity") +
scale_fill_manual(values=myPalette) +
theme(
panel.background=element_blank(),
legend.position="none",
axis.ticks.x=element_blank(),
axis.ticks.y=element_blank(),
axis.text=element_text(
family="Myriad Pro Regular",
colour="#333333"
),
axis.title=element_text(
family="Myriad Pro Regular",
colour="#333333"
)
) +
annotate(
"text",
label="Female",
x=20.5,
y=0.95,
size=5.5,
colour="#000000",
hjust="center",
family="Myriad Pro Regular"
) +
annotate(
"text",
label="Male",
x=20.5,
y=0.05,
size=5.5,
colour="#000000",
hjust="center",
family="Myriad Pro Regular"
) +
scale_x_discrete(
"Age Group",
labels=c(
"0-", "5-", "10-", "15-", "20-", "25-", "30-", "35-", "40-",
"45-", 50-", "55-", "60-", "65-", "70-", "75-", "80-", "85-",
"90-", "95-" ,"100+"
),
expand=c(0,0)
) +
scale_y_continuous(
"Percentage breakdown",
expand=c(0,0),
labels=c("0","25","50","75","100")
) +
geom_hline(yintercept=0.25,colour="#000000") +
geom_hline(yintercept=0.50,colour="#000000")

The data come from the U.S. Census Brief, “Age and Sex Composition: 2010” (May 2011) by Lindsay M. Howden and Julie A. Meyer. My dataset follows:

[pop_age_sex.csv]
agegroup,popm,popf
age000to004,10319427,9881935
age005to009,10389638,9959019
age010to014,10579862,10097332
age015to019,11303666,10736677
age020to024,11014176,10571823
age025to029,10635591,10466258
age030to034,9996500,9965599
age035to039,10042022,10137620
age040to044,10393977,10496987
age045to049,11209085,11499506
age050to054,10933274,11364851
age055to059,9523648,10141157
age060to064,8077500,8740424
age065to069,5852547,6582716
age070to074,4243972,5034194
age075to079,3182388,4135407
age080to084,2294374,3448953
age085to089,1273867,2346592
age090to094,424387,1023979
age095to099,82263,288981
age100to,9162,44202

I thank Dr Olson for his inspiration.