The US Life Expectancy Mess, Part 2: What Even Is “Life Expectancy”?

Working Title: “Life Expectancy, It’s Confusing And It’s Bad”

Xenocrypt
9 min readJun 25, 2019

Introduction:

This is a series of articles trying to approach the question of “the United States life expectancy gap compared to Other Rich Countries”, as seen in charts like this one:

As I wrote in Part One, that difference in trend is clear enough looking at individual “Other Rich Countries” or at an aggregate:

What explains the difference in life expectancy trends? I wanted to figure out how to approach this question myself — to quantify the contributions of different causes systematically. But I soon realized I didn’t really know what “life expectancy” even means. In this second article, I’ll try to explain it, along with a couple of other public health metrics — and why they can sometimes give different answers.

What Is Life Expectancy, Anyway?:

Quick: what does “life expectancy” mean? Theoretically, and in popular shorthand, I think it’s usually explained as “the average age a person born today will live”.

According to this article, though, in practice “life expectancy” means something a little less straightforward:

“If we could follow [a] cohort from birth until all members died, we could record the number of individuals alive at each birthday…[but] in practice…one generally works with a period, or current, life table…[with] the mortality experience of persons of all ages in a short period, typically one year or three years…the death probabilities q(x) for every age x are computed for that short period…these q(x)’s are then applied to a hypothetical cohort of 100,000 people over their life span to produce a life table.”

In other words, “life expectancy” is (usually?) calculated from current death rates by age, collapsing multiple cohorts into a “hypothetical cohort”, rather than from looking at a single real cohort and seeing how long people really did or will live.

What does that mean in practice? It might be helpful to look at a much-simplified example. Say that in the nation of Hypotheticalia, people only die at exactly ages 10, 40 and 100. Say as well that, in 2018, the death rate at age 10 was 5%, the death rate at age 40 was 20%, and the death rate at 100 was (by necessity) 100%.

The “life expectancy of Hypotheticalia in 2018” would be:

0.05*10 [5% of a hypothetical cohort of people would die at age 10]

+0.95*0.2*40 [of the remaining 95%, 20% die at age 40]

+0.95*0.8*100 [of the remaining 80% of 95%, they all die at age 100]

= 84.1 years.

Generalizing, if x is the death rate at age 10 and y the death rate at age 40, the life expectancy is:

10*x + 40*(1-x)*y + 100*(1-x)*(1-y)

=100–90*x-60*y+60*x*y.

Now, the “life expectancy” in 2018 might be 84.1 years, but would you really, well, expect a Hypotheticalian born in 2018 to live an average of 84.1 years? Not necessarily — unless you’re confident that in ten years the death rate age 10 will still be 5% and that in forty years the death rate at age 40 will still be 20%. And the Hypotheticalia cohorts born 10, 40, and 100 years ago might actually be very different populations in any number of ways.

Because of this collapsing of different cohorts into a “hypothetical cohort”, taking “life expectancy” literally can lead to absurd results. For example, imagine a one-time disaster that kills roughly 2% of a country’s population, mostly evenly distributed by age. The year of the disaster, death rates by age will jump 2% for every group. Since typical “life expectancy” calculations collapse every age group into a “hypothetical cohort”, they’ll basically assume that someone born the year of the disaster will risk dying from it at age 10, and age 20, and age 70, as if the disaster were an annual event.

Suddenly you have the 2010 earthquake in Haiti “reducing life expectancy nearly by half”, before it immediately bounces back up the next year:

(Again, that chart, and most of the data in these articles, is from the very handy Global Health Data Exchange or “GHDx”.)

As horrible as the Haiti earthquake was, again, it “only” killed ~2% of the country’s population. There’s no real sense in which it caused actual life expectancy to drop below 35 for the 2010 birth cohort, and I’m sure no one actually, well, expected that “life expectancy” to hold up as a prediction of “how long Haitians born in 2010 will live”.

So: “Life expectancy” is essentially a way of aggregating a table of current death rates by age into statistical artifact, a single number with the vaguely intuitive — but not really literally correct— interpretation of “how long people will live”.

(However sometimes there is more imputation done, especially when countries have more limited official data, and probably some versions of “life expectancy” make more of an attempt at being a prediction. This is where my expertise frankly runs out, but hopefully my explanation is still useful. Again I am a a beginner in this area.)

The Non-Paradox Of The United States And Denmark In 1990:

Can this understanding of “life expectancy” help us address the original question? Let’s consider the United States and Denmark.

It is accurate to say that “the United States used to have higher life expectancy than Denmark but it’s fallen behind since”. Here’s the Global Health Data Exchange if you don’t believe me:

In 1990, according to the site, the United States had a life expectancy of 75.6 and Denmark had a life expectancy of 75, so the United States was doing a bit better than Denmark there.

But since we understand that “life expectancy” is just a summary of death rates by age, let’s look at the full distribution. (It’s always good to look at the full distribution.)

A tool I made summarizes the death rates by age for each country and year on the GHDx site by applying them to the United States in 2017. (This is only for illustrative purposes, since otherwise the chart is dominated visually by death rates for people 95. Note for “Other Rich” I used my own calculation which differs slightly on the available countries.)

The United States — in part because of higher death rates for HIV/AIDS, physical violence, and car/transit accidents — had slightly higher death rates than Denmark for every age group from birth to 40–44. But Denmark had much higher death rates at older ages, in part because of significantly higher death rates for cardiovascular diseases. In the rather complicated formula for “life expectancy”, this worked out to Denmark being slightly behind the United States.

Is that the only perspective? There’s another public health metric, the “age-standardized death rate”. That’s just a sum of death rates by age weighted to a fixed sample population.

In 1990, the United States had an “age-standardized death rate” of 655; Denmark’s rate was 729. So by this metric, Denmark was doing, I think, significantly worse than the United States.

Are those the only perspectives? There’s also a third metric, the “age-standardized years of life lost rate”. I believe there’s several different definitions that public health statisticians can use for “years of life lost”. The definition used here is “the multiplication of deaths and a standard life expectancy at the age of death”, with “the standard life expectancy [] derived from a life table that contains the lowest observed mortality rate at each age that has been observed in any population greater than 5 million”. Basically it’s another weighted sum of death rates by age, but using different weights than “the age-standardized death rate”.

By this metric, in 1990 the United States had 15,889 years of life lost per capita, compared to 15,745 for Denmark. So the United States was doing worse than Denmark.

Which is the right answer? In 1990, was Denmark doing significantly worse than the United States, a little worse than the United States, or a little better than the United States?

Again: All of these metrics are just derived from death rates by age. I don’t think any of them are “the right answer”; they’re all just different formulas aggregating the same data, different ways of saying the same thing. “In 1990, Denmark had noticeably higher death rates for older people and the United States had somewhat higher death rates for younger people”.

If you plug that into the three formulas for “life expectancy”, “age-standardized death rate”, and “age-standardized years of life lost rate”, then you’ll get three different numbers. The “age-standardized death rate” basically counts every death the same, young and old, so the United States does better relative to Denmark. The “age-standardized years of life lost rate” weights younger deaths higher than older deaths, at least compared to the “age standardized death rate”, so Denmark does better relative to the United States. “Life expectancy” seems to come out somewhere in between.

Conclusion:

As much as I find statistics useful and fascinating, I don’t think you should let statistical formulas do your value judgments for you. When you choose a “public health statistic”, whether it’s “age standardized death rate or “age standardized years of life lost rate” or “life expectancy”, you’re implicitly choosing weights, that is, you’re choosing “how much more important younger deaths are than older deaths”. Calculations like that may be unavoidable when thinking about public health, but you shouldn’t necessarily outsource your moral choices to a formula you don’t understand.

“Life expectancy” is like “incarceration rate”, a perhaps useful single-number summary measure derived from more direct factors: the distribution of sentence lengths and conviction counts for individual offenses in the case of the “incarceration rate”; death rates by age group in the case of “life expectancy”.

These summary measures can be useful to show broad trends or comparisons. All else being equal, a higher incarceration rate means a more punitive system than a lower incarceration rate. All else being equal you’d rather life expectancy be going up than going down. But I don’t think these metrics should be treated as the only valid ones, or that everything in public health/criminal justice should be analyzed in terms of its effect on “life expectancy”/“the incarceration rate” specifically.

Still, I don’t want to over-emphasize the distinctions between these public health metrics. You’ll notice that, for example, they all show the United States doing increasingly worse relative to Denmark from 1990 to 2017. In practice, age/death curves aren’t going to look that different from each other, and the three metrics I mentioned seem to be very highly correlated with each other at a country-year level:

(Though some of this similarity may be because the GHDx uses “model life tables”, not just raw death rates by age — again, parsing that is well beyond my expertise.)

You could imagine other, seemingly plausible metrics, like “simulated probability of living to age 50”, or “years of life lost but assuming everyone should live to age 100”, and they would probably be quite correlated with all of the others, but give even more divergent answers in some cases.

Given that, how much should “life expectancy” really matter? I’ll discuss that briefly in Part 3.

Notes and Sources:
Public health data from:

Global Burden of Disease Collaborative Network.
Global Burden of Disease Study 2017 (GBD 2017) Results.
Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2018.
Available from http://ghdx.healthdata.org/gbd-results-tool.

--

--