Story Behind the Viz: The Baby Spike
When you first look at the visualization below, what do you notice first? The big spike? The colors? The shape? Something else?
Does your first impression make you want to look at it more? If yes, why? If no, why not? What do you notice as you look at it longer?
Does anything surprise you?
What features help you interpret what’s going on? Are these effective or not?
Does it make you ask more questions, or want to learn more? Why or why not?
Lastly, does it change any assumptions you had about baby’s births?
Nadieh Bremer and I created these data visualizations for the July 2017 issue of Scientific American’s Graphic Science page. I also wrote a more detailed accompanying blog post Why are So Many Babies Born around 8am?
The heart of the visualization is a radial area chart showing how the number of babies born for each of the 1140 minutes of the day compared to average. In print, this was accompanied by two other charts showing different time scales. Online, we shared three different minute-of-day chart; one for each method of delivery. This revealed distinct underlying seasonal patterns by delivery method, which together combined to create the observed overall daily pattern.
It’s just a one-page article, with 3 charts, and another 3 charts online. Yet, we thought a lot about how we wanted to present this data in a way that was engaging and best fit the story we’d discovered in the data and wanted to share. Perhaps some of these details relate to your answers to the questions I posed at the beginning?
Nadieh and I thought it would be fun to share our process and the story behind the viz. In my post here, I’ve highlighted some of my favorite design insights and decisions made in the course of the project and why we chose them. You can also check out Nadieh’s blog for her insights into the design process, including images showing iteration. There are also some great bonus bloopers!
This project had 5 main stages, and different design elements emerged from each stage. Let’s start back in March of 2016…
1. Finding the data and core story
In March 2016, I downloaded CDC birth data because I wanted to include an example of minute-of-day data from a public dataset to illustrate seasonality in very granular data for my OpenVis Conf talk Everything is Seasonal.
Most public datasets don’t have minute-level granularity, so I breathed a sigh of relief when I found this CDC (Center for Disease Control and Prevention) dataset showing the number of babies born per minute of day and day of week by year.
I’ve found that most data that has to do with people or nature will have minute-of-day seasonal patterns. So I expected to see something interesting.
But, wow! It was much more striking that I expected, especially when each point of the graph represents a minute and I “faceted” by day of week.
After sharing this chart with my friend Brendan who is a nurse, he responded: “Can you break out natural vs induced vs c-section?”
So, of course I did. And, this is what I saw. Note that all these charts have the same y-axis scale.
As Brendan replied “Damn that’s a lot of c-sections.”.
This visual form worked well enough for the point that I wanted to make in the presentation. Instead of being overwhelming or “too much data”, looking at granular minutely data was actually easier to interpret than aggregated data. It revealed strong, rich patterns. Seeing all the dots reinforces the feeling that this isn’t just some rogue datapoint; something is going on here.
I tweaked the charts just a little bit, and then they were ready for the talk.
I also found daily data, which I aggregated to weekly to remove the day-of-week effects, to illustrate the cycles that we also see annually.
At this point, I’d already established a few key insights that would influence the final visualization published in Scientific American.
- I showed different seasonal patterns at different levels of granularity (minute of day, day of week, and week of year).
2. It was clear that medical intervention was part of the seasonality story.
3. I loved that showing fully granular data, a dot for every minute, made the pattern feel more real and striking. And, lastly, showing multiple small versions of the chart, “small multiples”, provided context and enabled comparison.
The original form was effective for the talk, and it fit the rest of the presentation in which all the time series were presented with time on the x-axis and count on the y-axis.
However, in revisiting this data to create a stand-alone, printed piece, there were a number of specific challenges that I wanted to address.
For example, the peaks were much more obvious than dips.
It was hard to visually distinguish the more subtle shape of the spontaneous (no c-section/no induction) births vs induced. These are actually quite different, but they look pretty similar in this original form. The importance of this was actually much more clear once we had developed the new visual form, because the differences are so much more obvious.
The week of year and minute of day visualizations didn’t really fit together, as part of the same story or visual form.
The metric itself “the total number of babies born per minute on a particular day of week” was how the data was defined in the data source. But, it’s such an awkward construction to explain! Sometimes it takes another’s perspective to see the obvious. When Nadieh later asked why not just use “average number” instead of “total over the course of the year”, I immediately normalized the data and made the switch. To do this, I had to also adjust for the fact that some days of week occur 52 times and some 53 times in a year.
Oh, and aesthetics! Beauty isn’t just a “nice-to-have” when engaging a reader’s attention. It makes a big difference to create something that is enjoyable to look at. If you want to look at it, you’ll look at it more! Moreover, when done well, subtle aesthetic details help the reader notice more about the data itself and see a richer, more complex story.
The strengths, and weaknesses, of the original set of charts along with a changing the context to a polished printed piece set the stage for later design decisions. But, I’m getting ahead of myself…
A quick aside: why are there so many C-sections?
In the US in 2014, 32% of births were c-sections, 18% were induced, and 50% were spontaneous. Having seen the dramatic peaks at 8:30am and noon on the c-section chart above, it’s probably not surprising to learn that many of these were scheduled. In fact, of the c-sections, 75% of c-sections were planned/scheduled and the other 25% were unscheduled.
At this point you might be wondering why there are so many scheduled c-sections and inductions. That’s a great question. Understanding what drives these rates, why they vary from country to country and hospital to hospital, and what they “should” be requires investigating many intersecting factors. There are a suite of questions to ask about what’s driving decision-making and recommendations for hospitals, doctors, insurance, and patients. It’s also not an easy question, either at the population level (what percent of births *should* include intervention) and the individual level (what should this women do?). And, there are lives and health at stake in the answers, both for the moment of birth and the recovery afterwards. Historically childbirth is one of the most dangerous things a women might do in her lifetime and being born is one of the most dangerous things we all do in our lifetimes.
The goal of this visualization is to show that many births are scheduled. These “why” questions, however, are out of the scope. I hope we’ve peaked your curiosity to learn more and please check out the articles linked in the appendix of this article!
That said, it’s worth noting that “scheduled/planned” and “elective/voluntary” are not synonymous. For example, a heart surgery might by unscheduled, if somebody shows up in the emergency room or something goes wrong while they are in the hospital for something else. Or a heart surgery, like an angioplasty to clear partially blocked arteries, might be scheduled due to a high risk of something bad happening if a medical problem is not addressed relatively soon. In both cases, the procedure is recommended by doctors and the hospital. It’s just that in one case it’s in reaction to an immediate emergency and in the other there is time to plan ahead for how to best address a known risk. This is similar for c-sections and inductions. There are unscheduled inductions and c-sections due to something that came up unexpectedly during the labor. There are also scheduled inductions and c-sections, which in the US are primarily in response to a medical recommendation. These might be due to factors that are known before labor starts. For example, for C-section the woman might have diabetes, a heart condition, had a previous birth by c-section, or the baby might be in a “breech” position or not growing well enough — source.
2. A story about seasonality of birth, told graphically at three levels of granularity
A week after I gave the OpenVis Conf talk, I was absolutely thrilled when Amanda Montanez from Scientific American reached out asking if I might be interested in creating a visualization based on the talk. When I was a teenager, my family subscribed to Scientific American and I often discussed articles and ideas from the magazine with my Dad. Therefore, it was a dream come true to be invited to contribute to a well-respected publication which had also been so personally important to me.
I loved how Amanda’s framed the story in her pitch to the Scientific American editors, as shown below, and I was so excited to run with this idea.
The Seasonality of Birth
This idea was sparked by a data viz talk on seasonality I saw a while back. I had heard before that hospital births tend to spike around the times when doctors’s shifts are ending. However, I’d never seen it visualized…it is quite dramatic! In addition, there interesting patterns if you look at days of the week (fewer births on weekends and holidays) and weeks of the year (more births in late Sept/early Oct than any other time, consistent dips at the beginning of January, etc)....Could be interesting to do a set of time three series, each one zooming in from the previous one to show a more granular level of seasonality. And annotate to explain each.
Because of some schedule constraints in my personal life, including taking some extended time off to travel to Alaska, Oregon, Namibia, Mozambique, Egypt, Chile, and Argentina, it wasn’t until March 2017 that I was ready to follow up.
I was thrilled when she agreed to collaborate, and made the time for this project despite having a packed schedule of travel, work, and presentations. It worked out even better than I might have imagined, both in terms of leading to a better final product and being a fantastic, fun, creative, thought-provoking experience! In writing this blog post, I reread many of our emails from the time, and enjoyed reliving the energy, shared curiosity, mutual respect, and sense of exploration as we evolved our understanding of the data and form through words and images.
3. Defining the core visual form in a few intense creative in-person working sessions
In just two in-person working sessions in San Francisco, Nadieh and I worked together and established the core visual form we would use.
In particular, we would have 3 radial charts: one for minute-of-day, one for hour-of-week, and one for week-of-year. We also planned to dive into the second data set, by delivery method, at a later date. In each chart, we would focus on the differences compared to average rather than the raw counts, although we had a big design challenge in front of us for how to actually depict that difference to average (area? bars? something else?).
On day 1, I served up munged data from Python while Nadieh was working magic in R. We sat next to each other, in almost constant conversation as we played with various forms. By day 2, Nadieh had written the core of the viz into D3 and we again focussed on quick iterations coupled with lots of discussion. For these quick iterations on a relatively small project it worked well to have a bit of division of labor and focus on whatever enabled us to experiment/explore most quickly.
Technical skills open up a wider design space
While it’s easy to conceive of design and technical skill as independent, it was obvious during these two sessions how Nadieh’s facility in code directly impacted our design decisions. And, as Nadieh reminded me, “it was also often you providing me with a different view on the data within mere minutes that was crucial. During the first parts of our process being able to quickly create different ‘lenses’ on the data and create crude plots of those…is the complementary part to being able to quickly change visual elements later on in the design phase.” Throughout the project there were numerous moments where we were debating if we should go with one idea or another, when we would just stop debating and try it.
If this had been a 30 minute, 1 hour, or 5 hour task, we would have had to make a decision about if it was even worth trying. Or, we’d have just made a guess and gone with that. When it takes just a couple minutes, why not just see what it looks like? In this way, technical skill enables one to explore a much wider and more nuanced design space. This also means we could be more responsive to the data itself, because we could see what the form actually looked like with the real data rather than what we imagined it might look like.
Radial design, with comparison to average rather than to 0
Decisions were rarely all or nothing, but rather nuanced. When we first took a look at radial designs, there were some flaws.
Most importantly, it’s really easy to lose track of the center of the circle with radial line charts as your brain sort of just assumes that the center is in the center of the shape, even if it’s off-set. I had struggled with this in a previous project, weather circles, and recognized it as a challenge here.
Additionally, in both the original rectangular grid charts and our first radial line charts, peaks were more obvious than dips. And, more subtle differences in seasonality got lost.
In discussing how to deal with these issues, we realized that this wasn’t just a visual issue. It was also about the story in the data itself. The story we most wanted to reveal in this data wasn’t a story about the raw number of babies being born at any given minute.
Rather, the story was bout how the number of babies born compared to typical. Was it a dip or a spike? More or less than usual?
The insight about visual form came from trying to best match the form to the aspects of the data we most wanted to share. Sketching on paper, I suggested trying comparing to average. Nadieh gave it a shot, and we both liked it.
Instead of a line representing the distance from the center (number of babies per minute) we switched to an area chart representing the percent difference in number of babies born per minute compared to average.
Aligning our visual representation with the story we wanted to tell solved for both the visual issues of “losing track of the center” and “how do we see dips.”
Granted, a rectangular area chart comparing the peaks/dips to average would have also solved for both these problems too.
However, the circle had three major benefits over a rectangular form. 1 — The radial form emphasizes small shifts in what time a peak or dip occurs, since a change in time corresponds to a change in angle. 2 — It’s a very compact form, and reads as a cohesive shape that I think enables comparison. 3 — in a rectangular chart showing a cyclical pattern, the impression of the chart is heavily influenced by where you (arbitrarily) break the cycle.
Cracking the circle open
Another issue with circles is that there is no start or end. It’s not obvious where the reader should start “reading” the visualization. It’s also unclear where to put the annotation for the grid lines, since there is no “left side” of the chart.
Nadieh solves for this beautifully by “cracking” the circle at the top, a technique she often uses when creating radial designs. I loved it, writing in an email that “the gap at midnight is great — I think it’s small enough that it doesn’t break the circle too much, but large enough to provide room for the annotation. And, it gives the eye a nice place to start.”
Hour of Day or Day of Week? Neither!
We knew early on that we would focus on minute of day and week of year. But, the third chart was unclear. Hour of day would just repeat the minute of day story, but not tell it as well. And, day of week only had 7 sparse data points.
Nadieh tested a mixed chart, that had a curve for each day’s average value per-hour as a baseline along with scattered points for each hours value. But, it just didn’t click.
Finally, we decided to go with an unusual metric of “hour-of-week.” With 168 points per week there was enough data density to show clean patterns.
As you can see in the final chart, rather than feeling too repetitive, the hours-of-week supplemented the minute of day charts. They showed that those peaks were a weekday effect since the missing Saturday and Sunday peaks jumped out in the hour-of-week chart.
Highest peaks breaking the frame
This is a small point, but something that I really enjoyed having the chance to incorporate into this viz.
One of my favorite visualizations of all time is this chart of the prices of cotton in New York in the 1800’s from the US Statistical Atlas published in 1883. There was an obvious challenge: how do you show both the moderate variation in the pre/post-Civil War price of cotton in New York on the same graph as you show the absolutely massive spike in prices during the Civil War? I love the audacity of their answer: create a reasonable frame for the pre/post period and then break it. Oh, and don’t worry about the fact that your spike is not piercing through Chicago in a totally different map.
I love how breaking the frame helped contextualize local variation while also emphasizing the unusualness of the chart.
The minutely baby data had similar characteristics: a big spike along with smaller variation throughout the rest of the day. Therefore, I was psyched when Nadieh agreed with my proposal that the AM spike would break the frame in the overall minute-per-day and C-section charts. She executed this beautifully!
Being boring IS the story: 0% is the center of the circle
One surprising thing is how much more “boring” the final week-of-year chart is in comparison to the others. This isn’t an accident, but is part of the story. Yes, September is more common than January. But, these variations pale in comparison to the difference between a Saturday evening and Monday morning, or between 6am and 8am on a weekday. In all three charts, we pinned the center to 0 and the average to the same radius. Therefore a 5% change in one chart is equivalent to a 5% change in another, making them comparable.
Jen, our wonderful graphics editor, helped us realize that we weren’t communicating this idea clearly in our drafts and suggested that a better legend might do the trick. More on that on story in Nadieh’s blog post.
4. Iteration, refinement, and editing with attention to impactful details
In this part of the creation process, Nadieh was the MVP and I played a supporting role.
There are so many exquisite, meaningful details! You can read about them in detail on her blog, but I’ll highlight a few of my favorites here as well.
As soon as she moved to D3, Nadieh added a color gradient to the area chart. And, soon this became a diverging orange to blue spectrum with a sharp cut-off at the yellow baseline splitting the two. The spectrum is perfect since it shows off both dips and peaks, is color-blind safe, and even subtly matches the diurnal cycle with the dips tending the blue in the night and peaks in the morning/day.
Trusting your gut
One of the most critical aspects of design was how to represent the distance from the average line. Should it be bars, or really bar-like slices? An area chart with a smooth gradient? Bars with a gradient? Or, finally, concentric circles creating discrete slices?
This wasn’t just about the design, but also about trusting your gut.
While experimenting, Nadieh wrote:
The bars … are getting really thin for the inner section and you might end up with a Moiré effect or something (I tried to counter that a little, by making the inner bars actually mini pie chart-like slices that become thinner the more they coming inward)
I really do like the areas, but the perfection of it is somehow bothering me a bit. Like it feels too polished, too slick, without character.
A few iterations later we’d thrown out the bars due to the pervasive Moiré effects that she’d anticipated. Instead, in a middle of the night jetlegged empiphany, she’d come up with an approach that essentially discretized the area chart into concentric circles.
We debated back and forth what made the most sense, and for which charts. I reflected back that:
Discrete: good for comparing/exploring/seeing different parts *within* a chart. You can see how the curve of the data cuts into the concentric colored circles and see the shape of each concentric ring. And, it’s easier to get a sense of distance from the average line by the number of rings, which is especially helpful for comparing the magnitude of dips to bumps.
Smoothed: good for comparing *across* charts, because you see each chart as a singular shape. Also, within a single chart, gives more of an impression of the pattern as a whole (in contrast to comparing a particular peak to another peak or dip).
For the larger charts, we want the viewer to do some comparison across charts. But, we also want them to appreciate & explore a lot of what is happening within each chart. For the smallest charts, we want the viewer to primarily be comparing across charts — to see how different the shape is for induced, c-section, and natural. And, to get a sense of the overall shape for each chart itself. We wouldn’t expect them to be asking more detailed questions about how big is the 8:30am peak vs 9pm dip for c-sections.
… I propose smoothed for the small delivery method charts. I am torn between discrete & smooth for the large ones…What do you think? Do you have a preference between smooth color gradient and discrete color gradient? For all charts? For big? For small?
In reply, from Nadieh to me:
About the gradient/concentric circles. I like the circles better for two reasons, one is exactly what you describe, it helps to “fix” the major problem in circle visuals that it is hard to compare height differences along the angles. My second reason is completely emotional, like I said earlier this week, I actually didn’t like the smooth/perfectness of the gradient. It just felt like “too standard” of a design, not unique enough in a way. And I had some difficulty figuring out what else to do, and then I tried the concentric circles and that completely solved that feeling of “wrongness” for me on the design level.
And — that was the answer. We went with discretized concentric circles for the 3 main plots and a smoother gradient for the smaller ones.
It was exactly the right choice, and came from listening to a feeling of “wrongness.” This led Nadieh both to persevere to come up with the winning visual form, and to understand why it was the right answer.
A few more details: dots, dashes, smoothing, and the legend
Nadieh put the labels inside the ring, with subtle tiny breaks in the line to separate each hour while soft lightly dashed arcs to provide a background grid.
Jen, our visual editor from Scientific American, added the dot to start each month. This created a nice, subtle anchor.
Nadieh introduced loess smoothing to the viz, which was critical for keeping near-minutely granularity without getting distracted by jaggedness. This, combined with the per-minute dots showing the exact values, created a lovely balance of exact detail and overview while maintaining as much detail as possible. It built off of the strengths of my very first charts presented at OpenVis, showing the detail of each of the 1440 minutely data points while putting those dots into a smoother, still-granular context. The only chart we showed the detailed dots on was the minute overview, where they looked almost like a dusting of snow on the chart. For the rest, we stuck just with the smoothed gradient. It just looked right that way.
5. Writing the blog post
There wasn’t enough room on the printed page for all 6 charts, so on our visual editor Jen’s recommendation we focussed on the 3 levels of seasonality there. And, I was honored that Jen invited me to write an article on Scientific American’s blog going into more detail both visually and textually!
It was especially fun getting to show off how the same visual form could be quite expressive for these more sprite-like delivery method charts as well.
While the results of these design decisions are all visible in the final product, they were made possible by a few important invisible traits.
*A low bar to trying out ideas
*Trusting gut feelings, even before we could explain them or had an alternative design
*Open to changing our minds
*Attention to visual detail
*Focus on the data and drive the design from that
*Good communication, even as (especially as) ideas weren’t yet fully formed
*A great visual editor!
Thanks to Nadieh and Jen for a great collaboration, and to Amanda for getting this project off the ground in the first place!
If you want to read more, check out Nadieh’s blog for more of the story behind the viz!
A lot has been written on this topic in the past few years, including articles in St. Louis Post Dispatch, NYT, Consumer Reports, Kaiser showing the effects of a different payment model, San Diego Tribune, Sacramento Bee reporting C-section rates by hospital in California ranged from 15% to 64%, the LA Times reporting 2014 rates in California hospitals ranging from 12% to 70%, and a report from the Pacific Business Group on Health. Looking beyond the US, C-sections are even more common in Brazil as reported by the Atlantic.