Typical Names by U.S. State — Data Doodle #6

The BigQuery USA Names public dataset has been on my mind, lately. I used it for my Google Cloud Next talk on data visualization. I also wonder if I could do any kind of analysis that my pregnant friends & family members would find helpful.

Using BigQuery, I created a Google Sheet of names by U.S. state, organized by year and gender. These aren’t the most popular names, nor are they the most unique names. I ran a query to attempt to find “typical” names: names that are somewhat unique to a state, but not too unique.

If you have one of these names and were born between 1910 & 1920, there’s a 50% chance you are from Texas.

The name frequency column indicates the probability that someone is from a certain U.S. state, given they have a specific name and they were born a certain year.

If you have one of these names and were born between 1910 & 1920, there’s about a 50% chance you are from Ohio.

If you meet a man named Pearl who was born in 1917, there’s a 55% probability that he is from Ohio. In that way, Pearl is a very Ohioan name (at least it was in 1917).

To get a similar view of a state you care about on the Google Sheet, create a filter view in the Google Sheets and filter by name, state, gender, or year.

Running the query

The first query I ran is to calculate how often names appear in one state versus the rest of the country. I save the results to a table in my project: usa_names.names_conditional_probabilities.

Then, to filter this down to names that could be considered “typical”, I limit to names where there was at least a dozen in that state and there is close to a 50% chance that a person with that name is from that state.

Trying it yourself

You can use BigQuery to run your own analysis of public datasets for free, no credit card required, with 1TB of free queries per month.

Like what you read? Give Tim Swast a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.