Who’s In A Name?: Unisex Names From 1980–2017
This past weekend (5/19/2018), Reddit user /u/newishtodc posted a data visualization showing the most common unisex names between 1980–2017 by gender. The visualization grabbed my curiosity and encouraged me to recreate the original data set (which used data from the Social Security Administration, aka the SSA) as a data science exercise.
As I stitched together my data frames and tables in R, questions began to emerge:
- What qualifies as a unisex name?
- How are unisex names split over the recorded male and female genders in the SSA data?
- How have names and their gender-slant changed over time?
Gender is a deeply fascinating topic to me and I loved the opportunity to explore these questions in an established, data rich environment, so I dove right in.
Drawing the Line
To begin, I noticed that /u/newishtodc used a threshold of 30,000 people with a given name of male and female gender over the 1980–2017 period to determine what is (or isn’t) a unisex name. This resulted in a 10-name-only data set which was easy to work with, but I couldn’t find any significance to the 30k cutoff. So I went looking for other guidance.
What I found was an article on FiveThirtyEight.com by Andrew Flowers that provided some summary stats regarding unisex names. Andrew listed his methodology for choosing unisex names — which spanned more than a hundred years — as the following:
- At least 100 occurrences of the name in the total data set. This was to avoid including names with exceedingly small footprints (the SSA lists names that have at least 5 recorded births per year).
- At least (a) a third of the total name-bearers had to be male AND (b) a third of the total name-bearers had to be female.
Given FiveThirtyEight’s reputation and the fact that my data set was smaller than Andrew’s (thus meaning his measures would be even more stringent for me), I accepted his unisex criteria as my own and filtered my data appropriately. Then I looked at the birth population of those bearing unisex-names by year and gender, resulting in this:
(note: if any of the graphics appear too small, right click + open in a new tab)
This led me to Takeaway #1: People with unisex names are increasing more rapidly than the general population is increasing.
As you can see in the chart, from 1980–2017 there was roughly 300% (~30k to ~90k) increase in births with unisex names. By comparison, general births from 1980–2017 only increased about 15%. Even taking moving averages and our criteria into effect, there is a significantly greater increase among unisex names.
Next I wanted to know: Are women or men more likely to have a unisex name? To answer this question, I looked at a stacked version of the chart above:
Which yields Takeaway #2: Unisex name-bearers are roughly equally split between men and women.
There was a bit of a male-slanted bump between 1988–2008, but really the split is quite even across all 37 years. Interesting, and definitely not what I was expecting.
Slants, Splits, and Bias, Oh My!
This led me to yet another question, this time regarding the names themselves. We know name-bearers have roughly an equal gender-split, but do these splits themselves have a lean? That is to say, are there more male-slanted unisex names than female? Vice versa? About the same?
I created the following two charts to find out, mirroring the charts above but instead looking at unisex names by year and by gender-slant using this breakdown:
- > 50% Male-slant = Male-dominated Unisex Name
- > 50% Female-slant = Female-dominated Unisex Name
- = 50% Male/Female -slant = Tie Unisex Name
There are two big takeaways here:
Takeaway #4: Diversity in unisex names itself has increased between 1980–2017. Remember, we’re not measuring population here but name diversity, suggesting that perhaps (a) the pool of names has increased over time, or (b) that more names have qualified for unisex status over time.
Takeaway #5: There are more male-slanted unisex names than female slanted names. The stacked chart shows from 1989 onwards, there are (mostly) more unisex names that are male-slanted. Does this mean parents feel more comfortable naming a son a unisex name than they do a daughter? Or maybe the increase in unisex name diversity has been driven by parents naming sons, for some reason or another? I’m honestly not sure of the reason but it’s an interesting question to noodle over.
Puzzling over gender-slants in the unisex names themselves made me wonder about the distributions within any given year. I wondered if the male or female -slants were regular — that is, if most of the names had the same level of slanting, so to speak — or if there were outliers and heads/tails of each distribution pulling the stats up or down.
To examine these distributions across so many years, I used boxplots to look at the median, first and third quartiles, and min/max for each set of unisex names by gender-slant for each year. Whew. I know that’s a mouthful, so let’s look at the data for some visual grounding:
Here’s the cheat sheet on how to read these charts:
- The bottom most line is the minimum for that yearly distribution of gender-slanted unisex names, i.e. the name(s) that had the smallest gender-slant. As you can see, these minimum slants are all just a bit about 50%.
- The next highest marker is the first quartile, or the amount of gender-slant found at the 25% mark of the distribution.
- The marker above the first quartile is the median (remember, median != mean/average!)
- The marker above the median is the third quartile, or the amount of gender-slant found at the 75% mark of the distribution. Note that many of these quartiles are also the maximum (see below).
- And, finally, the top most line is the maximum for yearly distribution of gender-slanted unisex names, i.e. the name(s) with the most gender-slant. Almost always our maximum of gender-slant was 100%, meaning there were all males or females born with the given unisex name.
Got it? Uhh, let’s hope so because we’re moving on to our next takeaway:
Looking at these boxplots brings us to Takeaway #6: Distributions for unisex names by gender-slant are roughly the same, but female-slanted names have been “softening” recently.
It’s hard to say for sure what’s going on with the “softening” in female-slanted names. It could be a recent movement towards more boys being given unisex names, or that some female-slanted unisex names are leaving the name pool (i.e. lesser female-slanted unisex name diversity). Interestingly, we see a little of the reverse for male-slanted names at the start of our data set with very high medians that drop rather quickly. Could it be indicative of generational naming trends? Or is something else entirely going on due to our 37-year window?
Again, it’s hard to say. But these questions — along with the ones I brought up earlier regarding trends towards female-slanted unisex names over male-slanted ones — would be great for further examination.
The Game of the Name
Lastly, I wanted to look a bit at the names themselves (yes, I’m finally getting to the names). In specific, I wanted to answer two questions about the top 10 ranking names from 1980–2017:
- #1 How did the Top 10 rankings for the master set of Top 10 names — that is, all the names that achieved any Top 10 slot from 1980–2017 — change?
- #2 How can we visualize the gender-slant of the Top 10 names over time to understand gender-based popularity of Top 10 names?
Let’s start with #1. I took a master set of Top 10 names from 1980–2017 and created a matrix of their rankings over time, ordering by the names that had the highest aggregate rankings. This yielded:
This nets Takeaway #7: The most consistent, high-ranking unisex names are Riley, Casey, Peyton, Jaime, Skyler, Jessie, and Quinn.
I was going to only list the first five names, but I wanted to highlight some trends, mainly that:
- The name Riley has become the most popular unisex name in recent times, starting at humble beginnings back in 1986 (I wonder what happened then?). Contrast this with name #2, Casey, which has lost its luster as Riley rose to dominance. Fun fact, since 1980 there have been 183,000 more Rileys added to the US population!
- Riley’s popularity trend is mimicked by Peyton and Quinn, while Jamie and Jessie have gone the way of Casey in their waning usage.
- The oddball out here is Skyler, which rose from obscurity to an early aughts bump before fading out again. I know I shouldn’t assume Breaking Bad had anything to do with thid but… it’s probably Breaking Bad, right?
A couple of other oddities from this list are some of the one-offs, such as Infant and Baby. I wonder what happened here? My guess would be some sort of short-lived hospital policy of assigning default names to children or something similar. If anyone has more insight though, I’d love to hear it!
Now, onto my second question from before: How can we visualize the gender-slant of the Top 10 names over time to understand gender-based popularity of Top 10 names?
Here’s a heatmap visualization I put together to address this question:
I’m going to be immodest for a moment and say that I love this heatmap because it tells an interesting story, which is…
Takeaway #8: Gender-slants for unisex names were quite soft in the 80s, hardened up from 1988–2008 or so, and then got soft again. Which is a familiar time period, no? We spoke about our gender-slant distributions softening above as well, so there definitely seems to be some connection here.
I’m very interested to know why. Again, it could be a data range foible or generational trends, but I’d love to discover any anthropological reasons for this.
Another fun insight from this heatmap are the names that flipped, i.e. the ones that were female or male -slanting and reversed. I spot Quinn and Riley in that set. Do you see any others?
One last thing to point out is that there was exactly 1 Top 10 name that had an even-tie between males and females. That name? Justice, at spot #9 in 2003. Honestly, I don’t think I’ve ever met anyone even name Justice, so your guess for what’s going on there is as good as mine.
So there you have it, a deep dive into unisex names from 1980–2017. It’s worth noting that while I prepared my data in R (from the SSA, as mentioned), I did some of the futzing around once the data set was small enough in Excel. Also, because I’m clearly a masochist, I did my data visualizations in Excel as well, though I’ll be the first to say there are better options out there (especially for those boxplots).
I hope you enjoyed our little data voyage and, if you did, please consider hitting the clap button 👏 to help others find this piece too! For more of my work, check out my website http://michaelalwill.com, or find me on LinkedIn or Instagram.