Debunking the 1% fallacy
The recent Yellow Vests movement in France has picked my curiosity about the income inequality situation in France (and the world at large). Being of the data-driven kind I started looking for numbers about this in order to better gauge the state of affairs.
Code and data to replicate all the results below is available here (file dive_into_1_percent.R for the code and impots-france.csv for the data).
A first look at the data
What is interesting is that most of the data I could find online was breaking down income (or total revenue — which also includes capital gains) in deciles or percentiles. This is not surprising as it seems to be the common way to describe income distribution in the media and public discussions — see for instance how percentiles are used here (USA) or here (France). And when we look at graphs presented by newspapers and websites, it looks like this (for France in this case):
These graphs show that the so-called “1%” (famously highlighted by the Occupy movement and its “We are the 99%” slogan) is earning disproportionately more than the rest as the slope of the curve is increasing dramatically towards the right end. However, this exponential increase towards high revenues tells me that we’re not seeing the full picture. Indeed, where there is non-linearity, it is usually interesting to take a closer look.
The need for better granularity
I started looking for a more precise breakdown of income and for this I turned to the website published by C. Landais, T. Piketty and E. Saez to supplement their book about income inequality in France (Pour une révolution fiscale — For a fiscal revolution).
There I was able to find a more granular breakdown towards the high-end of the income spectrum. In particular, starting from percentile 99, they break down the necessary income to belong in the 10th, 100th and 1000th of percentiles which allows to derive a more precise understanding of the tail of the distribution.
A quick note on this data: it represents income from all sources (salaries, pensions, state aids, capital gains of all kinds, etc.) for the population above 18 years old living in France regardless of their status. However, I was not able to find easily the exact source for this and therefore I am unsure of what year this is from (the book was published in 2011 and I assume this must come from a date close to the publication of the book).
Now plotting the same percentiles as previously (i.e. breaking down by percentiles and not using the more granular data for the last percentile we now have access to) we get the following graph which confirms the trend displayed above and confirms that this data seems to make sense.
This graph confirms the existence of three regimes in the income distribution: first we have the population that has no income at all (just under 10%), then we have a fairly linear regime where income gradually increases, and finally we have the top 5–10% where income seems to increase exponentially.
A visual exploration of the “1%”
Now, we’ll start using the extra granular data we just got. Plotting the graph above and including all the data for the fractions of percentiles at the very top, we plot the income distribution again.
Ok, the picture seems very different all of a sudden. It looks like basically everyone is making very little, except a tiny fraction at the very top. Where we previously could distinguish a special regime for the top 5–10%, we now only see a steep increase towards the very end although it is unclear where this exponential growth starts.
We will zoom in progressively to see what the right tail of this graph looks like at different scales. We’ll start by focusing on the top 10%, 1%, 0.1% and finally 0.01%.
Looking at the top 10% only, it is hard to see the difference with the whole population as the graph only skyrockets at the very end. All of a sudden the 1% does not seem to be doing that well anymore.
And indeed, someone just qualifying to be in the top 1% (left-most region of the graph) barely registers above 0 on this graph.
Even looking only at the top 0.1%, we see that a tiny elite has revenue that largely surpasses the rest of this group of super high earners.
And finally looking at the top 0.01% we still see that a minority earns much more than the rest.
If we had even more detailed data (1/10,000th of a percent for instance) this behavior would continue towards the top. However, at the level of granularity we have (1/1,000th of a percent), each increment represents only ~500 individuals (out of 50.4M people included in the dataset).
The 1% fallacy
The point of this article is NOT to advocate to leave the “1%” alone or to say that they are not over-privileged compared to the rest of the population. Rather, I wanted to illustrate graphically that the reality is a bit more complex, as is often the case with non-linear phenomena. In particular we always need to stay aware of the vast differences that can exist in the tail of a non-linear distribution. When plotting them, the choice of granularity of the data is never innocent as the effects are usually dramatic at the extremes.
For instance, the difference in income between the lowest-earner of the 1% and the lowest-earner of the whole population is €120k, while the difference between the lowest-earner of the 0.01% and the lowest-earner of the 1% is €1,165k. A 10x difference.
This means that when we discuss the “1%” we lump together populations that have very little in common without realizing it (someone making €120k a year is living well and belongs to the 1%, but compare that to someone making €6M a year — the threshold to belong to the top 0.001% — and you quickly realize that their lifestyles are completely different). This error stems from the fact that our minds are very bad at thinking non-linearly and we completely underestimate the behavior of exponentially increasing phenomena.
Now I am unable to put a clear threshold on where to split between the highest-earning elite and the rest (Paul Krugman argues it should be 0.1% instead) but it feels like we collectively need to look deeper into who exactly is the 1% — or, for that matters, what we think is fair in terms of income difference — instead of falling into the trap of simply using whatever data granularity we are fed by the media. On this point actually I am still surprised by how hard it is to find better sources of data and it seems that most of the available statistics only break it down into deciles and percentiles. I wonder if this is simply for convenience or if there is a deliberate desire to hide what this distribution really looks like for the super-high earners. Maybe because their lobbying power increases exponentially as well?