Visualizing Birth Statistics Using D3.js

Dan Bridges
Parallel Thinking
Published in
3 min readMay 22, 2018

I’ve been following Mike Bostock, the creator of the excellent d3.js library that powers many of the data visualizations on the web. He has been touting his new platform, Observable HQ, an online javascript notebook environment similar to that of the Mathematica or Jupyter notebook.

To test out Observable, I decided to look at some of the birth statistics provided by the CDC. I downloaded data from the CDC WONDER Natality repository on ages of mothers when they have their first child. Most of us have heard that people are waiting longer to have their first child, but I was curious what exactly was driving that. Did everybody just delay it by a few years, or were there certain populations of people that drastically changed their behaviour?

First, I processed the vitality data from the CDC. I downloaded data for the years 2003 through 2016 and manually formatted it into a nice csv file containing the birth count for each mother’s age between 15 and 49, for each year.

Next, to fully answer our questions, we also need demographic data to correct the birth counts given in the CDC data for the total number of women alive in each age group. We do this so we have the ability to view the trends only of the women giving birth, and not the trends of their parents (a large baby boom will skew the raw data once the baby boomers enter their 20s and 30s and begin having children of their own — totals births would be seen to be increasing, when that count is really only a reflection of the underlying population sizes). The NIH supplies US population estimates on a per age basis for every county in the US, available here. I used Python and pandas to aggregate this data to national data:

import numpy as np
import pandas as pd
# Import data
df = pd.read_csv('census.csv')
# Select females
df = df.loc[df.sex == 2]
# Sum across year and age for all counties
agg = df.groupby(['year', 'age'], as_index=False).agg(np.sum)
# Select ages that we have vitality data for
agg = agg.loc[(agg.age >= 15) & (agg.age <= 49)]
# Export data
agg.to_csv('census_condensed.csv')

To generate our final condensed data file, with both vitality and population data, I appended the population data from above and divided birth counts by population to obtain birth rate, an estimate of the percentage of women having their first child at each age for a given year. After creating the chart using d3 (see the notebook linked below for full code), I then added interaction using Observable’s viewof functionality. The Observable notebook allows for selection between bar and line plots, between total birth count and birth rate, and allows navigation to easily transition between years.

Check out the notebook here: Birth Statistics: Mother’s Age for First Born

Overall my impression of Observable HQ is fantastic, it allows you to quickly build an interactive document and showcases the power of javascript data visualizations.

--

--

Dan Bridges
Parallel Thinking

Software developer at Beezwax Datatools and former researcher in Physics & Neuroscience.