Misleading With Statistics

How journalists make arguments with distorted data

Eric Portelance
i ❤ data

--

When I was a kid, I remember reading an issue of Mad Magazine that had a gag about statistics. They showed how a theatre owner might create ads with misleading data to persuade an unsuspecting audience to attend.

In one ad, the theatre manager said that “attendance doubled last week!” The accompanying before and after cartoons depicted 2 people in the audience the first week, and 4 people in the second — with one person walking out the door.

And yet, years later, I see these types of mistakes all the time in the media. The most common are charts that seem manipulated to support a conclusion the author has already reached. Or, perhaps less nefariously, a chart that was created by someone with no understanding of how to accurately present data and draw conclusions from it.

I’m going to pick on Bloomberg for no other reason than it’s the most recent example I encountered. And they should know better — they report on finance and markets, after all. They also make these things:

Bloomberg Terminal

Case in point, this article: For U.S. Men, 40 Years of Falling Income, by Mark Gimein. He’s the “Companies and Markets editor at Bloomberg.com, and lead writer for the Market Now blog and newsletter.” Sounds like a smart guy, right? Unfortunately he makes many of the same mistakes that I’ve seen in countless other publications.

In the article, he uses U.S. Census data to demonstrate that the median income for men (adjusted for inflation) has declined consistently over the past 30 years. Have a look at the chart he uses to prove his point:

What conclusions do you draw from this chart? It looks pretty grim, right? Look at the slope of that line. We should be extremely concerned!

Maybe not. There are several problems with this chart, and I’ve created new versions based on the same data for illustration purposes. Let’s have a look.

Not enough data points

In the original chart, the author only uses two data points for each age group. What if there was a spike somewhere within those 30 years? Or if the decline only started in the past few years? And what if 1972 and 2012 were outlier years that skew the trend?

Let’s go back to the U.S. Census data and add in all the years between 1972 and 2012 to get more resolution. Here’s what we get:

Adjusted to increase data resolution

That looks a bit different, doesn’t it? Let’s analyze the 45-54 group for a moment. Immediately we can see that their median income was actually relatively stable between 1972 and 1999, contrary to the author’s broad conclusion. If we picked out only two data points in this series (say, 1972 and 1999), we would actually conclude that median incomes had remained stable for that group. All is well. But, like the author’s chart, it would be a misleading version of this story. When we look at the whole time period, it’s true that there was a decline in income for 45-54, but that decline didn’t set in until 2000.

As for the other age groups, they declined more steadily, but the 25-34 group rebounded somewhat during the Clinton years.

Another conclusion we might draw from this chart is that median incomes for men were somewhat volatile depending on what political party was in power. Incomes decreased under every U.S. Republican President except Reagan (where it went up), and increased under every Democratic President except Obama (it’s too early to tell based on this data). Bloomberg’s analysis didn’t take any of this into account.

Let’s move on to the next problem, because the chart above is misleading, too.

Truncated graph

The second issue with the author’s chart is that the y-axis doesn’t start at zero. Why does this matter? A chart that has truncated the y-axis tends to amplify changes. Let’s see what happens when we make the y-axis go down to zero with the same data as in the previous example:

Chart adjusted for data resolution and y-axis truncation

What do you think now? If you saw this chart on its own, would you conclude that men’s incomes have been falling dangerously for 40 years? Perhaps. This definitely shows a downward trend and an overall decrease in incomes. But the slope is much less pronounced than in Bloomberg’s initial chart, and in the chart I created in the previous example.

There are a few other comments we can make looking at this new chart. It shows that incomes decreased quite steadily for 25-34 year olds between 1972 and 1993. Incomes then rose throughout the Clinton years, and started dropping again during the Bush era. Why might that be?

For 35-44 year olds, it seems like a pretty steady decline since 1972. And for 45-54, incomes remained relatively stable until 2000, and they’ve been dropping steadily ever since.

Even if we go back to Bloomberg’s two data points, but bring the y-axis down to zero, the change doesn’t seem quite so drastic.

The author’s original two datapoint chart, adjusted to show the whole y-axis

This is still an awful chart, but at first glance the trend seems more gradual than the Bloomberg one with the same data.

Scale

The final point I want to raise is a scale problem. The author chose to show the past 30 years. So what’s the issue? There’s actually an extra 25 years of data that was omitted.

That means Bloomberg’s original chart could be very distorted if both 1972 and 2012 were outlier years. As it turns out, 1972 was an outlier in the sense that it was the peak U.S. median income for men.

Let’s look at the entire Census data, going back to 1947:

All available Census data going back to 1947

That’s amazing. From 1947 until 1972 there was steady year-after-year growth in annual income.

What happened in 1972? Bretton Woods and the end of the Gold Standard? I’m not an economist so won’t bother speculating beyond that, but I’m showing you the entire data to prove how easy it is to manipulate these numbers into supporting different facts that may or may not be telling the whole story.

This data now paints a very different picture. We see a steep rise in the post-war years, followed by plateaus and declines at different rates and different times for the groups. It becomes very hard to make broad generalizations about what’s going on here and what caused it.

It’s also worth noting that the gap in 25-34 year old incomes started to widen — perhaps related to the growth in the service sector and white collar jobs.

To make an extreme point, if we wanted to write the headline “Incomes for U.S. men have risen in the last 65 years,” we could support it with a two datapoint chart, exactly like Bloomberg’s. We need only use the 1947 and 2012 datapoints:

What two datapoints tell us happened to incomes between 1947 and 2012

Looks great! Let’s pat ourselves on the back. The world is a better place and everything is rosy.

I’ve seen these types of charts in many other publications than Bloomberg. They aren’t the only ones to make these errors. I write this not to pick on them specifically, but rather to demonstrate how easy it is to manipulate data to tell a story that confirms your pre-existing views.

Similarly, poorly analysed and presented data can lead people to draw the wrong conclusions. I’ve tried to demonstrate both examples here so you can be more vigilant when you see an article like the Bloomberg one I linked to, and so you can be more cautious if you are using data to tell a story.

--

--

Eric Portelance
i ❤ data

Co-Founder of Slake Brewing in Prince Edward County, Ontario. Previously: Co-Founder of Halo Brewery, Strategist in Digital Product Design