Part II: One set of data, many stories
Or, why a dual y-axis chart is not a normalized delta chart
In my original post, One Set of Data, Many Stories, I wrote about how I found a particular dual y-axis chart misleading. The core problem was that it had two y-axis for the same metric, with a different scale for each axis.
Isn’t it just a delta chart?
Elijah proposed that the problem isn’t about dual y-axis since I could have made it into a “a single axis chart by plotting ‘delta in mortality since 2000’.”
I like framing this in terms of deltas explicitly, because the point they seem to be trying to make with the chart is about change compared to an earlier point in time.
I agree that delta charts and dual y-axis charts are perceptually similar when the scales for the y-axis are the same but the baselines differ in order to pin both lines to a shared reference point.
However, if the y-axis scales are different, then a dual y-axis chart can be perceptually quite different from a delta chart.
Let’s take a look
In the original article, I found very similar data from the CDC and remade the chart as closely as I could. The exact data is slightly different than the original, as it’s for a 10 year age range rather than 5 year. But, it tells the same story.
Because R doesn’t support dual y-axis charts, I’ve made them side-by-side.
I’d argue that side-by-side charts are slightly better than dual-axis because it implies that there is some difference between the charts. However, these charts still share the fundamental problem of the original: implying a direct comparison between two charts which have different y-axis scales for the same type of data. In both side-by-side and dual y-axis views, it appears that mortality rates for whites and blacks saw similar declines through the early 2000’s and the diverged since ~2009 or ~2011.
The fundamental problem isn’t actually that it’s two y-axis on the same chart. Rather, the fundamental problem is visually equating two y-axis that have different scales for the same metric.
Here are the same two charts, but using the same y-axis scale for both.
Now, here is the suggested delta chart, showing difference in mortality compared to each line’s 1999 value.
So, yes — the delta chart and the dual y-axis with shared scale do look essentially the same.
But, they look very different from the original chart.
It might be better to compare percent differences rather than absolute differences in this case. Dropping from 50 to 45 deaths per 100,000 people might be more significant than dropping from 150 to 145, it’s a larger drop as a percent of mortality rate.
The red lines look the same, as the scale is determined by the min/max which both come from the red line. The blue line, however, shows greater variation in the percent change chart than the difference one.
This is no longer equivalent to a dual y-axis chart with shared y-axis scale.
But, I think that’s a good thing because the goal is to compare later values to the earlier values — and the percent change better captures the meaningful comparison.
Interestingly, you could get a similar visual effect by using dual axis. This does mean that the y-axis have different scales and different baselines (to align the starting points). In this case, the scale is determined by setting a 5% change from the 1999 value, or a difference of 8.72 for red vs 2.8 for the blue, to be equal. The baseline is then determined by aligning the starting values to be the same distance from the bottom of the chart.
While possible, I think this approach is quite problematic. It’s not clear from the charts *why* the y-scales and baseline were chosen. There is no clue that percent change is even part of the story, much less driving the scale and baseline. These decisions appear arbitrary and changeable, even though they were determined by the data.
In contrast, in the chart where both lines use the same percent change y-axis, it is explicit that percent change is driving the both the content and form of the visualization.
Focus on the story: Percent change since 2009
Arguably the point of the original chart was to highlight the divergence since 2009, not since 1999. Adam Pearce pointed out that this could have been achieved by using a delta chart that both pinned to the 2009 data and focused only on the data from 2009 onward. In this case, I used percent change since 2009. This makes the implicit comparison to a 2009 explicit.
This emphasizes that mortality rates have declined for blacks/African Americans while they have risen for whites relative to the 2009 data.
This chart doesn’t tell the whole story, of course. No chart does. And, as with any chart, it is worth questioning the extent this story is meaningful without the larger context. That said, I do think this chart tells a succinct story that is supported by the data.
Moreover, not only does it avoid the same problems of the original chart, but the choice of form reflects the story itself. This is story about change relative to a point in time. The chart’s form matches the story, and reflects the data.
My hope is that this chart would create a better shared understanding of exactly what aspects of the data we’re focussed on, and thereby create a better conversation about what we should learn or do based on this data.
- A dual y-axis chart is only similar to a normalized delta chart if the y-axis scales are the same
- Splitting a dual y-axis chart into two side-by-side charts doesn’t fundamentally solve the problem of having two y-axis with different scales
- Choosing a chart form that is more appropriate to the story in the data is not just about avoiding “breaking a data visualization rule”, but can also be more effective in focussing on and telling the intended story.
Thank you to Elijah & Adam for the conversation that led to this post!