# Chart 3: Proportion of males and females in the USA (2010), by age group

In order to become more proficient in R, I’ve been redoing some of the simple statistical analyses that I learned in school and reproducing charts that I find in magazine articles. For the third in my chart series, I’ve chosen a stacked bar chart that shows the proportion of females and males in the United States, by age group.

The chart above is a reproduction. The original, by Randal Olson, can be found (along with several others) in the December 2015 issue of Significance magazine. Dr Olson points out that there are often several ways to present the same information graphically. The best way, he says, depends upon the point you’re trying to make or the features of the data you’re trying to emphasize. This chart makes it very clear that as the population ages, the proportion of males declines precipitously. It tells a story before you even have a chance to read the caption. I love this!

#### Tech Notes

The chart was prepared entirely in R; no Photoshop or other post-processing was applied. The code appears below.

`# Load the data framed <- read.csv(file="pop_age_sex.csv", head=TRUE, sep=",")`
`# Calculate the percentages of males and females:d <- within(d, pctm <- popm / ( popm + popf ) )d <- within(d, pctf <- popf / ( popm + popf ) )`
`# should be all zeroes:x <- with(d, pctm + pctf)`
`require(reshape2)d.melted <- melt(d)`
`library(ggplot2)`
`library(showtext) # enables the use of OTF fontsshowtext.auto(enable=TRUE)font.add("Myriad Pro Regular","MyriadPro-Regular.otf")`
`facets <- c("pctf","pctm")myPalette <- c("pctf"="#ED1D27","pctm"="#1B88B4")ggplot(  data=d.melted[d.melted\$variable %in% facets,],  aes(x=agegroup,y=value,fill=variable)) +geom_bar(stat="identity") +scale_fill_manual(values=myPalette) +theme(  panel.background=element_blank(),  legend.position="none",  axis.ticks.x=element_blank(),  axis.ticks.y=element_blank(),  axis.text=element_text(    family="Myriad Pro Regular",    colour="#333333"  ),  axis.title=element_text(    family="Myriad Pro Regular",    colour="#333333"  )) +annotate(  "text",  label="Female",  x=20.5,  y=0.95,  size=5.5,  colour="#000000",  hjust="center",  family="Myriad Pro Regular") +annotate(  "text",  label="Male",  x=20.5,  y=0.05,  size=5.5,  colour="#000000",  hjust="center",  family="Myriad Pro Regular") +scale_x_discrete(  "Age Group",  labels=c(    "0-", "5-", "10-", "15-", "20-", "25-", "30-", "35-", "40-",    "45-", 50-", "55-", "60-", "65-", "70-", "75-", "80-", "85-",    "90-", "95-" ,"100+"  ),  expand=c(0,0)) +scale_y_continuous(  "Percentage breakdown",  expand=c(0,0),  labels=c("0","25","50","75","100")) +geom_hline(yintercept=0.25,colour="#000000") +geom_hline(yintercept=0.50,colour="#000000")`

The data come from the U.S. Census Brief, “Age and Sex Composition: 2010” (May 2011) by Lindsay M. Howden and Julie A. Meyer. My dataset follows:

`[pop_age_sex.csv]`
`agegroup,popm,popfage000to004,10319427,9881935age005to009,10389638,9959019age010to014,10579862,10097332age015to019,11303666,10736677age020to024,11014176,10571823age025to029,10635591,10466258age030to034,9996500,9965599age035to039,10042022,10137620age040to044,10393977,10496987age045to049,11209085,11499506age050to054,10933274,11364851age055to059,9523648,10141157age060to064,8077500,8740424age065to069,5852547,6582716age070to074,4243972,5034194age075to079,3182388,4135407age080to084,2294374,3448953age085to089,1273867,2346592age090to094,424387,1023979age095to099,82263,288981age100to,9162,44202`

I thank Dr Olson for his inspiration.