Data Communication Sandbox #1

4 min readSep 19, 2017

I’m playing around with NIS Teen data. NIS Teen is a survey that measures vaccination rates for adolescents age 13–17, for several vaccines. It covers all 50 states and some cities and territories.

After transcribing the data from the published PDFs into a workable format, I started playing around with some of the data. For one set, I looked at how vaccination rates for Tdap, MenACWY, and HPV compared for Philadelphia, Pennsylvania, the USA, and a few other cities — Chicago, Houston, and NYC.

A profession-standard chart of the data — what you’d expect to find in a journal article or on a government web site — might look like the clustered-column chart, below. It shows data in 3 dimensions: the x-axis, the y-axis, and the clusters. And, at a glance, it helps show that there are big differences in coverage rates for different vaccines.

This leaves out a lot of footnotes that should probably be included with this chart

But it’s tough to look at. Sure, I could put more work into it and use a slick color scheme (and, for example, choosing similar colors for 1+ dose of HPV and HPV up-to-date) and organize some of the elements, but that won’t take away its limitations: it’s hard to compare values in different clusters. That’s the drawback of a clustered-column chart. It doesn’t draw your eye to, say, Philadelphia’s MenACWY coverage rate compared to the USA rate.

I modified it, switching the clusters. This helps you see the disparity across geographies for each vaccination, but with 7 geographies, you’ve got to do a lot of bouncing back and forth between each tower and the legend in order to really read the chart.

And, really, these charts don’t draw your eye to anything. They lay out the data and they make you do the work — and in that way they’re not much of an improvement on a raw spreadsheet.

Because I found these standard approaches to be insufficient, I rolled up my sleeves and got to work making something that would really tell the story in these data.

Below is the exact same data in a different presentation.

To do this, I put some central principles to work:

Find a main message.

Data aren’t just data: they describe something. Analyze the data, figure out what they describe, and pull out the main message that you want to communicate (see also: Don’t just visualize data — communicate it).

Ultimately, I’m using these data to tell a story about how Philadelphia is doing compared to other cities and the rest of the country, so I want to focus my message and my visualization on that.

Clearly communicate the message.

I looked at the data and found that Philadelphia’s coverage rates are higher than national rates. Then I gave the chart a title that tells my audience exactly that. I told them what conclusion they’re supposed to draw from the data.

Design the charts to reinforce the message.

After writing the main message in the title I looked for ways to reinforce it with design. A few easy choices:

Highlight the subject: I wanted to make sure that I designed the chart to really push that straightforward title that I wrote. I started with a big bold blue for Philadelphia, a faded red for the country, and then gave a faded grey to the others. This gives visual weight to my primary value (Philly) and secondary value (USA) — it shows what the chart is really about.
Order the data: they may be bar charts, but the highest values are always on top, and go down in order

Get rid of the junk.

I fade out the gridlines a bit, remove unnecessary axis labels (I think if I have things like city and state names on an axis, nobody really needs a “geographical area” label to tell them what’s going on).

I also chose to separate out the charts into 4, instead of clustering them all together. It’s a simplification that lets me re-order the Y-axis each time, to preserve the order of the data.

Go back to the charts above, and then take another look at this one. Which would you rather look at? Which helps you really see what’s going on in the data?

It’s not quite complete — I’m just playing around in a sandbox, after all.

The chart titles use internal jargon and I’ll need to take a thorough review of the data to see if there are important notes that need to accompany the data. And there may be additional room for improvement (hey, if you see some, let me know in the comments): for example, on another look, I should change “rate” to “percentage.”

But it’s a good start, and I’m happy with it.

Data Communication Sandbox #1

Find a main message.

Clearly communicate the message.

Design the charts to reinforce the message.

Get rid of the junk.

Written by Matthew Montesano