Typical Names by U.S. State — Data Doodle #6
The BigQuery USA Names public dataset has been on my mind, lately. I used it for my Google Cloud Next talk on data visualization. I also wonder if I could do any kind of analysis that my pregnant friends & family members would find helpful.
Using BigQuery, I created a Google Sheet of names by U.S. state, organized by year and gender. These aren’t the most popular names, nor are they the most unique names. I ran a query to attempt to find “typical” names: names that are somewhat unique to a state, but not too unique.
The name frequency column indicates the probability that someone is from a certain U.S. state, given they have a specific name and they were born a certain year.
If you meet a man named Pearl who was born in 1917, there’s a 55% probability that he is from Ohio. In that way, Pearl is a very Ohioan name (at least it was in 1917).
Running the query
The first query I ran is to calculate how often names appear in one state versus the rest of the country. I save the results to a table in my project: usa_names.names_conditional_probabilities.
Then, to filter this down to names that could be considered “typical”, I limit to names where there was at least a dozen in that state and there is close to a 50% chance that a person with that name is from that state.
Trying it yourself
You can use BigQuery to run your own analysis of public datasets for free, no credit card required, with 1TB of free queries per month.