Q#22: Births by state

Given the following dataset, can you find the top state for baby births?

-Credit to: Data Interview Qs

TRY IT YOURSELF

https://colab.research.google.com/drive/1v5uZSkCbsCYmZqvjSaj_JOWMTVJvLmCV?usp=sharing

ANSWER

This question tests your knowledge of data wrangling in Python, usually this is done with the help of the packages numpy or in this case Pandas.

The Pandas library gives us the useful dataframe data structure and the first step is to download the data into this format with the .read_csv() function.

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv", index_col = 0) # The index_col identifies that the first input in each new line should be set as row indices
df.head() # See 1st five rows

With the dataframe structure, we see that the data is huge with a Count of the Names per State per Year. In order to get what we want, the births per state, we need to aggregate the data by state and then sum up the counts. Luckily, these type of operations are common on Pandas dataframes and can be chained together. The groupby() function does the aggregation by the column we want (in this case State), then we can select the column we want using the .<column_name> method, afterward we can summarize the total of the counts using the sum() function, and finally we can use sort_values() to arrange the result. All these will be chained to the original dataframe using a series of .’s .

df.groupby('State').Count.sum().sort_values()

We find that California (as we could have guessed) has the most births. However, since we are data scientists we can do even better and plot this result again using a chained dataframe function, simply the plot() function.

df.groupby('State').Count.sum().sort_values(ascending = False).plot(kind = 'bar', figsize = (16,8), title = "Births by State", ylabel = "Number of Births") # The arguments inside the plot function can be found in the documentation, but are rather self explanatory

--

--