Is Hilary *still* the most poisoned baby name in U.S. history?
This post is a 2020 update to Hilary Parker’s classic 2013 blog post and statistical analysis “Hilary: the most poisoned baby name in US history”. If you’d like to see the full details of my analysis, take a look at my Jupyter notebook. If you’d like to play around with the data yourself, clone my repo and follow the instructions there. And if you’d just like to visually explore the popularity of U.S. baby names over time, I’ve built a simple app for just that purpose, available here.
One of my all-time favourite statistics blog posts is Hilary Parker’s “Hilary: the most poisoned baby name in US history”, from the Not So Standard Deviations blog. I came across it early in my data science career, and was immediately charmed by its mix of curiosity, ingenuity, and statistical rigour.
In the original post, Parker sought to answer a simple question: Is Hilary/Hillary the most poisoned baby name in U.S. history? Years earlier another blogger alleged just that, based on baby name data made available by the Social Security Administration. But the original blogger’s analysis was not the most data-rich or statistically rigorous, and so Parker decided to dig deeper, offering an even more authoritative analysis showing that yes, Hilary is indeed the most poisoned female baby name in U.S. history, suffering a 70% relative loss in popularity between 1992 and 1993, from which it never recovered.
Now, Parker’s blog post was published in 2013, and some things in the world have, well, changed since then. And this led me to wonder: Is Hilary/Hillary still the most poisoned baby name in U.S. history? Might some other name have been even more poisoned in the years since (say, in 2016)?
The results surprised me. There is indeed a new name in the data that is even more poisoned than Hilary/Hillary, though it’s probably not the name you expect. Curious? Then read on.
Let’s begin with a quick recap of Parker’s original analysis, and her method for identifying the most poisoned baby name in U.S. history. Her analysis was based on data from the Social Security Administration, which lists the number of babies registered with each name in every year from 1880 to present.
Using these data, Parker transformed each name’s annual count into a percentage, by dividing each name’s count by the total count of baby names for that year. For example, if there were 100,000 babies born in 1929, and 2,000 of them were named John, then John’s percentage value for that year would be 0.02.
Following this, Parker calculated the relative loss in these percentage values from year to year. For example, if John’s percentage value was 0.02 in 1929 and 0.01 in 1930, his relative loss between the years would be –50%, since the percentage value decreased by 0.01, which is 50% of 0.02.
From here, Parker simply sorted the data by these relative loss values, thereby identifying the names that suffered the greatest relative loss in a single year — in other words, names that were suddenly and severely “poisoned”. Based on this sorting (and after a bit of controlling for fad names), the results were clear: Hilary was the most poisoned baby name in U.S. history, suffering a 70% loss in popularity between 1992 and 1993.
Parker’s analysis is, in my opinion, by and large as it should be: Parker provides a very reasonable definition of poisoning, and an efficient method for identifying the most poisoned names. Nonetheless, in updating Parker’s original analysis, I did make a few small adjustments. First, obviously, was to include the data from the years since Parker’s blog post. But this was only the beginning, as the data we now have access to is not only longer but deeper.
Here’s what I mean: Back in 2013, it seems that the Social Security Administration did not make its full baby name dataset available to the public, forcing Parker to instead scrape the publicly available data from their website. Because of this, Parker’s data were limited to the 1000 most popular male and female baby names per year, from 1880 to 2011. My dataset, in contrast, includes all names with a count of 5 or more, from 1880 to 2019. This is nice, because it provides me a broader and more definitive pool of data to study.
Nonetheless, I did still have to filter out very unpopular names, since these names would erroneously throw off my results. Think about this way: A name that had a count of 20 in 1955 and 5 in 1956 would be said to suffer a roughly 75% loss between the years. However, the name was so unpopular to begin with that it really shouldn’t be considered to be in the running for most poisoned name. To truly be poisoned, a name needs to become unpopular from a place of prior popularity. Thus, to filter my data, I considered only those names who were in the top 75% of baby names in the year before their poisoning. (By “top 75%”, I mean that if you took each name’s percentage value for a given year, sorted them all in ascending order, and calculated the cumulative sum for each name, the top 75% would be those names with cumulative sum values less than or equal to 0.75.)
Yet it wasn’t enough to simply filter the data to popular names. With this filtering alone, the names with the largest one-year relative loss in percentage in my dataset look like this:
These results are broadly similar to Parker’s initial results. The problem — in addition to the fact that Hilary is all the way down at #8! — is, as Parker also observed, many of these names are “fad names”: names which had a sudden spike in popularity, followed immediately by a similarly sharp drop in popularity. This can be seen clearly if we plot the popularity of these ten names over time:
As we can see here, most of these plots exhibit a “needle pattern”, suddenly jumping up in popularity and then immediately falling back down. These aren’t true cases of poisoning, since such names weren’t popular prior to their spike and thus had no real reputation to be poisoned.
Because of this, such fad names need to be filtered out of our final results as well. One way we could accomplish this is simply through visual inspection, sifting through our plots until we find the genuinely poisoned names. I, however, decided to use a more formalized approach, by ensuring that names were popular, not only in the year immediately before their poisoning, but several years before their poisoning as well. Thus, to filter out fad names, I considered only those names who were in the top 80% of baby names ten years before their poisoning.
There was one other important adjustment I made in how I chose to handle the data. In her original analysis, Parker looked only at the data for female baby names. But this isn’t what we should be doing if we want to definitively discover the most poisoned baby name in U.S. history. First, we should be looking at the numbers for both female and male babies. Second, we shouldn’t be distinguishing between male and female baby names at all. If a name is truly poisoned, presumably it should be poisoned for both sexes. (This is especially relevant since Hilary is used as a name by both men and women.) Thus, in my analysis I combined the male and female baby name datasets, adding the counts of any names that appeared on both lists together.
There was one more change I ultimately chose to make to Parker’s methodology, but I’ll wait to introduce that until a bit further on, when its relevance will be more clear. For now, let’s start seeing some results!
To jump right to the chase, here is the list of the top ten most poisoned baby names in our updated dataset:
That’s right: If we define “poisoning” as the relative loss of popularity in a single year — as Parker herself did, in her original analysis — then Hilary is no longer the most poisoned baby name in U.S. history. That (dis)honour now belongs to Isis, which suffered a 70.8% relative loss of popularity in 2015 (following, presumably, ISIS/ISIL’s ascent to global prominence in 2014), just barely edging out Hilary’s 69.7% relative loss in 1993 (following, presumably, Clinton’s ascent to the presidency in 1992).
To get a better sense of these results, here are the plots for the top ten names:
A couple of interesting observations can be made here: First, though Hilary/Hillary doesn’t occupy the top spot, it does occupy the second, fourth, fifth, and sixth spots, suffering strong poisonings across multiple years and for both of its spellings. So even if it’s not the most poisoned name, it’s by all means still very poisoned. Second, the #3 most poisoned name, Grover, is, like Hilary/Hillary, related to a presidency — though in Grover’s case, the name’s most poisoned year came after the start of Grover Cleveland’s second (nonconsecutive) term. Lastly, it should be noted that none of the other poisonings come anywhere close to the poisonings of Isis and Hilary/Hillary: though Caitlin/Caitlyn, the next most poisoned name, did see a fairly steep drop in 2016 (following, it seems, Caitlyn Jenner’s coming out as trans in 2015), this came after decades of steadily decreasing popularity. Thus, Isis and Hilary/Hillary do seem to be well described as the top most poisoned baby names in U.S. history.
Nonetheless, I didn’t think it was quite right to leave things here. Thus far, we’ve been identifying poisoned names by looking at each name’s relative loss of popularity over a one-year period. Yet it seems to me that it’d make more sense to look at the relative loss over a two-year period, since some names might start to drop in popularity midway through a year, and since a steep two-year drop would seem to reflect a more definitive poisoning. And as Parker noted in her original analysis, Hillary (as opposed to Hilary) never even showed up in her data simply because it took two years to descend from its peak of popularity rather than one. Thus there may be other names we’re missing out on here as well.
Here, then, are the top ten most poisoned baby names by two-year relative loss of popularity:
Our results have changed! Namely, if we define “poisoning” as the relative loss of popularity over two years, Hilary/Hillary is the most poisoned baby name in U.S. history: Hilary fell in popularity by 87.3% between 1992 and 1994, while Isis dropped 86.6% between 2014 and 2016. Furthermore, Hillary now appears on the list as the third most poisoned name, dropping 83.0% over the same two years that Hilary saw its drop.
What does all this tell us? First, it shows that Parker’s original blog post was completely spot on, even though she was working with a smaller dataset and did not run her analysis on all baby names regardless of sex: Up until 2013, Hilary was indeed the most poisoned baby name in U.S. history.
However, my updated analysis has revealed an even more poisoned name, whose poisoning occurred in the years since Parker’s original analysis: To date (November 2020), Isis is the most poisoned baby name in U.S. history, if “poisoning” is defined as the relative loss of popularity in a single year, barely edging out Hilary/Hillary for the top spot.
But this isn’t the only, or necessarily the best, way to define poisoning. And if we define “poisoning” as the relative loss of popularity across a two-year period, Parker’s original results are reaffirmed: Hilary is the most poisoned baby name in U.S. history.
So, what’s the best metric? Should we look at relative loss over one year or two? Is Hilary or Isis more deserving of the title of most poisoned baby name?
Rather than try to adjudicate these issues, let me point out another relevant fact: Hilary/Hillary, at its height, was a much more popular name than Isis ever was, accounting together for nearly 0.1% of all baby names in 1992. Thus its poisoning is all the more impressive and severe.
Furthermore, between 1992 and 1994, not only was there a concurrent poisoning of both Hilary and Hillary, there was also an analogous poisoning of the first lady’s last name, Clinton. (Though this name was also, of course, the president’s last name, so we shouldn’t rest all the fault on HRC.)
Whether all this makes Hilary/Hillary more deserving of the top title, I’ll leave to the judgment of the reader. Regardless, there’s at least one thing that we can confidently conclude: American parents seem to hate two things, and they hate them nearly equally: terrorists and powerful women.