The US Life Expectancy Mess, Part 5: Technical/Apologetic Note.

Xenocrypt
4 min readJun 25, 2019

--

I want to note and/or defend a couple of technical issues here that I didn’t want to put into the bodies of the other articles.

The first three relate to the only “real math” I did, which was to calculate public health metrics for my constructed “Other Rich Countries” as well as for the hypotheticals. (Calculating death rates by age for “Other Rich Countries” was just an exercise in arithmetic.)

Calculations Of Age-Adjusted Death Rate And Years Of Life Lost Rate:

As I wrote in the articles, I treated Age-Adjusted Death Rate and Age-Adjusted Years Of Life Lost Rate as simple weighted sums of death rates by age group (as the GHDx glossary implies they are). To find the weights, rather than…asking whoever would actually know…I did a simple linear regression on a country/year basis. This had an r² of, essentially, 1, so I am confident they are the right weights and right conceptualization as far as that goes (see below).

However, the GHDx also provides age-adjusted death rates and age-adjusted years of life lost for individual causes, and there are occasional small deviations there from this approach. I assume GHDx may have done a bit of processing on top of a simple weighted sum in these cases, especially for things like neonatal deaths.

Calculations Of Life Expectancy:

The other calculation I did was to calculate life expectancy for the aggregate of “Other Rich Countries”, as well as for my hypotheticals of the United States with some death rates by age and cause swapped out. For these purposes, I used what I believe is called the “Fergany Method” applied to the death rates by age, as explained here.

To be clear: I do not think this is, exactly, what GHDx did. I believe GHDx used a more complicated “model life table approach” I don’t really understand and certainly can’t critique or evaluate in an informed way. However, the formula I used does agree very closely with their results, on a country-year basis, for nearly all countries and years, and is a particularly good match for “rich countries”.

To illustrate this, below is a correlation graph for the three metrics I looked at (age-adjusted death rate, age-adjusted years of life lost per capita, and life expectancy) by country-year, comparing the results directly from the GHDx tool with the results from my formulas computed directly from death rates by age:

So, while I am sure the actual math the GHDx is using to calculate these metrics is more complex than mine, I think my numbers for the “Other Rich Countries” aggregate should be very close to what GHDx would have come up with.

Fake Hypotheticals:

I mentioned this in the last article, but when I say that a particular cause “explains 10% of the change in the difference between Country A and Country B from year X to year Y”, I only mean that in the literal, arithmetic sense. Like the literal arithmetic:

(Cause_A,Y-Cause_B,Y-Cause_A,X+Cause_B,X)/(Total_A,Y-Total_B,Y-Total_A,X+Total_B,X) = 10%.

Obviously that’s not the same as a real counterfactual or a real proof, where you have to think about incidence effects and cohort effects and survivor bias and all that. For example, if the United States had lower drug overdose deaths for people aged 30–34 in 2010, maybe that changes the death rates for other causes for people ages 35–39 in 2017 in some way beyond the literal arithmetic, and then that would change the projected “life expectancy”, etc. So, take these figures more as attempts at explanations of the formulas than as real discussions of causality.

Conceptual Simplification:

Throughout this series, I’ve implied that death rates by age are “real data” or “raw data”, directly observed and measureable statistics in contrast to the statistical artifacts of “age-adjusted death rates”, “age-adjusted years of life lost per capita”, and “life expectancy”. Conceptually speaking, or in terms of how they’re defined,I think this is more or less fair. “1,100 Americans between the ages of 75 and 79 out of 100,000 died of cardiovascular disease in 2017” is a somewhat more direct claim than “the life expectancy of an American in 2017 is 79 years” or whatever.

However, in the actual numbers I used from the GHDx, death rates by age are not just reported directly from government vital statistics sources, but are themselves processed and smoothed to some extent. Again I think this is the “model life table approach” among other things. When I compared numbers from the CDC to US GHDx numbers, IIRC they did not match exactly. I don’t mean this as either a critque or an endorsement of the GHDx approach, I just wanted to note it. In other words, I think I have more or less the right formulas for getting GHDx’s public health metrics I looked at from GHDx’s death rates by age — but I don’t think I’d be able to reproduce GHDx’s death rates by age from raw data. That’s why it’s a huge project! If you want to read their technical methodology appendix, see here (pdf).

Feedback Appreciated:

I am not a public health expert by any means and learned about most of this stuff for the first time while writing these articles. Certainly, if you are such an expert and noticed a mistake or misstatement, please let me know and I will try to edit or update the articles accordingly.

Notes and Sources:
Public health data from:

Global Burden of Disease Collaborative Network.
Global Burden of Disease Study 2017 (GBD 2017) Results.
Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2018.
Available from http://ghdx.healthdata.org/gbd-results-tool.

--

--