One set of data, many stories
In March 2017, Brookings Institute put out a paper by Case & Deaton on Mortality and Morbidity in the 21st Century.
This report got picked up in the news.
It also inspired a number of blog posts questioning or supporting the findings including from Gelman (more questioning) and Noah Smith (more supporting). Gelman had also questioned parts of their related 2015 article. That said, he “was in agreement with Case and Deaton’s main point, even if I thought they were wrong about the direction of the trend and I was skeptical about their comparisons of different education level.” Instead, he argued that “The news media — left, right, and center — had a pre-existing narrative of middle-aged white malaise, and they slotted the Case and Deaton reports into that narrative.”
Some aspects of the article are certainly concerning. For example, they point out that deaths due to drugs alcohol and suicide are increasing in the US for men & women aged 50–54. This is not the case in comparable countries.
However, they present other charts that I found quite questionable. Specifically, in the Appendix they point to heart disease mortality rates for women aged 50–54.
Upon first glance, my immediate take away is that white mortality rates (in blue) have surpassed black (in red), rising starkly in recent years.
This is not true. Digging in, you might notice that there are two different y-axis being used for the same mortality metric. For whites, the axis goes from 5o-58 deaths per 100K. On the right, for blacks, the y-axis ranges from 115 to 165 deaths per 100K. So, in 2015, the red line for blacks ends in the bottom right corner at 115 deaths per 100K while the blue line for whites ends in the upper right corner representing a much smaller 56 deaths per 100K population.
You might argue that the trend comparison is what is important, and these axis allow this comparison. I question that.
One of the best descriptions I’ve heard for data viz is that: when the data is different, the viz should look different and when the data is similar, the viz should look similar.
If you allow yourself to have two y-axis for the same metric, with both a different scale on each axis and a different base value, then you can make a lot of charts with the exact same data that look very different.
To replicate this, I found very similar data from the CDC. It is slightly different, as it’s for a 10 year age range rather than 5 year, but I think you’ll agree that it’s close enough. Data is here, if you want to play along at home.
I actually can’t use ggplot2 in R to plot two y-axis on the same chart, because Wickham believes “plots with separate y scales…are fundamentally flawed” -stack overflow. So, I’ve plotted them side-by-side. This is for women aged 45–54 in the US, showing deaths due to heart disease per 100K people.
Let’s look at this same data in some other forms. How about with less extreme axis?
Or, with the same y-axis?
What if it included zero also?
Maybe even a higher max y value?
Do you get the same take-away from each of these charts? Or does your experience and impression change based on the different y-axis? Which one is “right”? Keep in mind that these charts all show EXACTLY the same data.
Here is another view, with the two categories on the same chart, sharing a y-axis.
While there might be times where you want to have a different y-axis, perhaps to normalize the data in some way, it should only be done with caution and with thoughtfulness (and explanation) for why it is being done.
In all other cases, be wary of how your choices for axis may impact the interpretation of the chart.