The gender gap of baby names

fxp
3 min readSep 26, 2018

--

Names matter. Having the right first name might help you get a job, and first and last names have been used to predict other factors like race or ethnicity.

The most popular baby names of 2017 are Emma and Liam. Yawn, that might be old news for you. Let me take another angle on the topic and show you some compelling evidence that girls' names are less diverse than boys' names.

(Note: This post focuses on the two binary sexes male and female, mostly because that's the classification that the Social Security Administration uses.)

Last letters

The Social Security Administration publishes statistics on first names of new borns. They show the top 1,000 on their web page, but you can download a more complete dataset which includes every name that was at least seen five times.

Ladies first, let's look at the girls. Interestingly, the vast majority (almost 40%) have names that end in the letter a. Think of Emma, Olivia, Ava, Isabella, Sophia, Mia: they are the top 6 names in 2017 and all end in a. This doesn't even include homophones like Hannah or Sarah. The second-most-common letter, e, is far back with less than 20% nowadays. (Charlotte is ranked #7 in 2017.)

Frequency of female names ending in…, over time

The chart looks a bit different for the boys.

Frequency of male names ending in…, over time

There used to be a tight race between the four letters d, n, s and y in the mid 1950s—nowadays, the stats are dominated by names ending in a single one: n, like in Logan, Benjamin, Mason. Second most popular is a newcomer: r as in Oliver.

But all of the above examples are only ranking #5 to #9 on 2017’s hit list of top boy names, the list is lead by Liam, Noah, William, James. So is there a gender gap in baby names? Are the last letters of boy names more various than of girl names?

A measure of competition

A way to measure the amount of diversity is the Shannon Entropy. It is a statistic of how much information the last letter of a name conveys. The highest possible value is 26 which would mean that all 26 letters are equally likely. Let us use it here to express how much diversity we see in the last letters of baby names.

Entropy of the last letter of baby names, over time. Higher value = more diversity

So are boy names more diverse? The answer is yes: Female baby names are less various in their last letters than male baby names. The jury is out—is this just how the English language works? Or is it related to stricter and less varied role models for girls?

Interestingly, the final-letter gender gap got smaller over the years, reaching a minimum in 2011, and has since widened again because male names gained diversity quicker than female names. (In both cases, it's mostly a decline of letters ending in n driving that trend.) That same year, 2011, is also the year when the gender wage gap was smallest. Coincidence? Hard to say.

Please find the source code of this project on Github.

--

--

fxp

Director of Data Analytics at vroom.com (buy and sell your car online and get it shipped home). I love all things data, bread, and photography.