The US Life Expectancy Mess, Part 3: The Impossibility Of Truly “Decomposing Changes In Life Expectancy”.

Xenocrypt
4 min readJun 25, 2019

--

Introduction:

This is going to be a (hopefully) short and somewhat hand-waving note. Early on in this project I tried to look up existing formulas to “decompose changes in life expectancy” to particular age groups or causes. There IS academic work on this, make no mistake, and I don’t consider myself qualified to really evaluate it as anything remotely approximating a public health expert.

However, as a pure amateur, I have two issues: First of all, I think it’s trying to solve what’s basically an unsolveable problem, and second of all, I don’t know if I think the answer should really matter all that much. I’ll try to explain each of these in turn.

The Mathematical Impossibility:

In the last article, I mentioned a nation of “Hypotheticalia”, where people only die at ages 10, 40, and 100. The formula for “life expectancy” was 100–90*x-60*y+60*x*y, where x is the death rate at age 10 and y is the death rate at age 40. When the death rate at age 10 was 5% and the death rate at age 40 was 20%, this worked out to a “life expectancy” of 84.1 years. Note that the formula is not linear, ie, there’s the “interaction term” of 60*x*y.

Let’s say that a few years later, the death rate at age 10 is now 10% and the death rate at age 40 is now 30%. The “life expectancy” of Hypotheticalia is now 100–90*0.1–60*0.3+60*0.1*0.3 = 74.8 years. Of the 74.8–84.1= -9.3 years of change in “life expectancy”, how much is “explained” by the change in death rates at age 10, and how much by the change in death rages at age 40?

Just from the formula itself: the -90*x term for deaths at age 10 contributed -90*(0.1–0.05) = -4.5 years, the -60*y term for deaths at age 40 contributed -60*(0.3–0.2) = -6 years, and the remaining 60*x*y term makes up 60*0.1*0.3–60*0.05*0.2 = +1.3 years. That adds up to the -9.3 year change.

Ignoring survivor bias issues for now, -4.5 years can be put under “caused by deaths at age 10” and -6 years can be put under “caused by deaths at age 40”. But how should those remaining +1.3 years be “divided up” between “caused by deaths at age 10” and “caused by deaths at age 40”?

You can play around like, (x+dx)*(y+dy) — x*y = y*dx + x*dy + dx*dy and give x*dy to “y” and y*dx to “x” and ignore dx*dy because it’s a small term, or just call it “an interaction term”, whatever. I don’t think there’s really an answer. The formula itself, even in this very simple example, is not, mathematically, entirely decomposable into age group effects, let alone into causes of death effects.

Again, the academic work does of course seem to acknowledge interaction terms and so on, which I believe can sometimes be so small as to shrug off. And in practice I am going to treat life expectancy in the next article a bit like “another weighted sum”. But fundamentally, if you’re trying to be precise, I just don’t see how you can get to a genuine answer like “45% from changes in age-10 deaths, 55% from changes at age-40 deaths”. And if you don’t mind being crude, then I think crude methods can get you pretty much to the “close enough answer”, at least for these introductory articles. So while I did read a bit about the academic work, I’m not going to use those formulas as such here.

Should This Answer Even Matter?:

Putting aside the question of the mathematical possibility of “decomposing changes in life expectancy”, there’s the separate question of how much its even worth doing. Which, again, I recognize somewhat goes against this whole article series, but hopefully doesn’t entirely go against it.

To repeat myself from the last article: Life expectancy, as commonly used, does not seem to actually be “how long people are going to live”, or even really an attempt at that. I think of it more as a statistical artifact sort of inspired by that concept, and correlating in most cases with general well-being, but it doesn’t really directly measure anything on its own.

So if “life expectancy” doesn’t really measure anything on its own, then why should we care about how precisely we calibrate the contribution of this or that to changes in “life expectancy” in particular? I think there is some risk of confusing the map and the territory here.

Let’s say that we could, in fact, determine that stomach cancer contributed 55% to a particular decline or gap in life expectancy and motorcycle accidents contributed the remaining 45%. Would that mean that 55% of policy attention should go to stomach cancer and 45% should go to motorcycle accidents? Not really. Would it mean that “stomach cancer was 55% of why things got worse and motorcycle accidents were 45%”? Not really —life expectancy is one metric of many; other metrics might have different breakdowns.

So, what’s the point, except as a math exercise? I don’t know, honestly, but people seem to care about it, and it is an interesting math exercise. In Part 4, I’ll make an attempt to answer it, very crudely. But because “life expectancy” is only an imperfect proxy for “public health”, “decomposing life expectancy” is only an imperfect proxy for “decomposing public health”.

Notes and Sources:
Public health data from:

Global Burden of Disease Collaborative Network.
Global Burden of Disease Study 2017 (GBD 2017) Results.
Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2018.
Available from http://ghdx.healthdata.org/gbd-results-tool.

--

--