Why are the most basic problems in medicine some of the hardest to solve? This series explores the big questions that science is still struggling to answer.

As nasty diseases go, malaria is particularly vile. Caused by a parasitic protozoan — a microscopic one-celled creature that lives inside human blood — and spread by mosquitos, it’s the kind of disease that might seem like nothing much, at first, until the havoc within becomes too much for the body to withstand. Chills give way to fever and nausea and headaches; these symptoms can pass in a few days or weeks. Meanwhile, as the parasite reproduces, it takes refuge inside the kidneys, the lungs, the liver, the brain. At a certain point, it can overwhelm these organs, and the host — the human — will die.

For years, malaria has been considered especially deadly to children, particularly in Africa. In 2010, the World Health Organization’s World Malaria Report put the number of annual global deaths caused by malaria at 655,000, making it among the most deadly of infectious diseases, alongside HIV and tuberculosis.

The WHO number, however, turned out to be wrong, and not for the better. A 2012 analysis by the Institute for Health Metrics and Evaluation, a University of Washington-based group funded by the Gates Foundation, found that the actual number was closer to 1.2 million deaths — meaning that just about twice as many people were dying of the disease as previously thought.

It was a baffling correction. How could the mortality estimate for malaria, a disease that gets a great amount of attention and resources, a disease that has such long history and distinct pathology, be so off? And how was it possible to get the number right?

There are no lack of challenges in health and medicine, from inventing new medicines to parsing our genomes to spotting cancer at the earliest possible opportunity. But it turns out that one of the hardest problems seems like it should be one of the simplest: how to count disease and death.

Counting is notoriously hard in a great many areas, from crowds at presidential inaugurations to a proper death toll from Hurricane Katrina. Sometimes counting requires massive resources and results in startling accuracy, such as the $13 billion dollar, once-a-decade U.S. census. Sometimes counts are subject to who’d-a-thunk revision, as in a recent doubling in the number of penguins in East Antarctica (bumped up to 6 million from 3.6 million). But in few disciplines does it get as complicated, or as controversial, as in statistics around disease and death — or what epidemiologists call morbidity and mortality.

These statistics matter because they have profound consequences on matters of research funding (the WHO budget is about $4 billion annually, while the National Institutes of Health spends $33 billion a year), philanthropic donations (the Gates Foundation spends around $5 billion a year), and politics, both local and global. This makes getting the numbers right of the utmost importance, and makes it all the more remarkable that the numbers can be, time and again, just plain wrong.

The first comprehensive estimates of disease and mortality came from Edwin Chadwick, a 19th century English social reformer who endeavored to wake the nation up to the scale of misery in its midst. In his landmark work, Report on the Sanitary Condition of the Labouring Population, Chadwick devoted the second page to a striking table: a detailed accounting of the number of deaths by various diseases (typhus, smallpox, consumption, diseases of brain, nerves, and senses, etc), tallied by county, in the year 1838. This list was a revelation, the likes of which had never been seen before, and it sparked a wave of reform in the UK (including the public health law of 1848) and inspired similar efforts in the US (as well as serving as a source for Frederich Engels’ The Condition of the Working Class in England, a seminal work of socialist thought).

A linear representation of the number of deaths in London, by disease, by Edwin Chadwick, 1842

Before Chadwick, it was unclear whom or what was responsible for monitoring the public’s health. After Chadwick, it became a clear expectation of government; they should have reliable statistics on causes of death, and these statistics would be the data upon which policy could be reliably drafted, from housing regulations to sanitation laws to new government departments and budgets. Today, these sorts of data are so familiar as to be banal, at least in the US and other developed nations. We take as a given that the Centers for Disease Control and Prevention (CDC) tracks the 2,626,418 annual deaths from heart disease or the 26,150 deaths from Parkinson’s disease, and we believe these aren’t mere estimates but veritable facts.

These days, there is indeed a great deal of rigor and science behind disease and mortality numbers. Most figures are based on a set of codes known as the International Statistical Classification of Diseases and Related Health Problems, or ICD. Now on its 10th revision (thus ICD-10), the ICD codes are maintained by the WHO to classify disease and used worldwide in patient records and death reports. Importantly, in the US, doctors and hospital are required to use the codes by Medicare and Medicaid, as well as private insurers, in order to process payments — which means that without an ICD-10 code, patients don’t get treated and physicians don’t get paid.

These codes then get filtered up a chain to create the statistics issued by the CDC, from doctors to hospitals to county health departments to state departments of health to the federal CDC. This is the public health infrastructure, a tremendous system of accountability and measurement, that we take for granted in the United States.

Globally, though, the numbers can become more sketchy and less reliable. People in the developing world typically die of different causes than in wealthy countries, infectious disease versus chronic disease, broadly speaking. Chronic disease, by definition, is often experienced for years, filling up a patient record with evidence, where infectious diseases can strike quickly and far from a hospital or doctor. In nations where doctors are in short supply, there’s often a struggle to account for a specific cause of death, making estimates in, say, Africa and remote regions of Asia much more tentative.

In 2016, the WHO joined forces with the IHME to create a more rigorous process for estimating mortality rates. Called GATHER (Guidelines for Accurate and Transparent Health Estimates Reporting), it offers an 18-item checklist of reporting standards for quality mortality data. Of the 183 WHO member states, only 69 nations currently meet these standards, with the nots including Greece, Poland, Saudi Arabia, and Turkey. Many of the failures — what the WHO classifies under “garbage” data — boils down to ambiguously tallied deaths; a vague accounting of “other respiratory disease” or imprecise cancers or deaths noted by symptoms rather than cause. The WHO expends considerable resources to get workable figures for China and India. These nations generally lack extensive public health infrastructure but, given that together they account for 35% of the world’s population, it’s important to get something close to right.

Even after significant effort, the counts can still be a ways apart. Various official estimates for HIV-related deaths in South Africa in 2015, for example, range between around 100,000 and 300,000 a year.

Various estimates of deaths from HIV from the United Nations Population Division (UNDP), the Institute for Health Metrics and Evaluation (IHME), the World Health Organization (new method), the Joint United Nations Programme on HIV/AIDS (UNAIDS), and the World Health Organization’s previous method

Ultimately, each of these figures require sophisticated statistical analyses, computer models, and reasonable guesses. Sometimes one nation’s reliable data becomes a proxy for a neighboring country’s missing data; sometimes authorities will track mortality intensively among a smaller population, and then use those to estimate the figures to the nation at large. And sometimes more accurate figures don’t demand fancy technology or biostatistics so much as boots-on-the-ground resourcefulness. The 2012 IHME malaria estimate of 1.2 million, for instance, was bolstered by a process known as “verbal autopsy” — which basically means the researchers simply asked the next of kin what people had died of or what symptoms they had before they died. This method turned up thousands of cases that had been missed or miscategorized in the morgues. It turned out, ironically, that in countries where malaria was common, physicians were much more likely to underreport it as the cause of death. Sometimes disease can be so familiar, it seems, that it just disappears.

It’s not just in global health and poorer countries where counting bodies is a challenge. Even in the developed world, experts and authorities get the numbers wrong surprisingly often. Earlier this year, for instance, researchers from Johns Hopkins University found that black women die of cervical cancer at a rate 77% higher than previous estimates; the mortality rate for white women was 47% higher than thought. (The old numbers, it turned out, included women who had had hysterectomies, which would eliminate any risk of such cancer.)

More examples? Last year, a study of chronic obstructive pulmonary disease (COPD) in Italy found that cases were underestimated by about 37%. A rigorous Australian study of obesity rates found far more cases than previously measured; instead of 17% of the population, fully 25% of Australians should be considered obese. A new assessment of fetal alcohol syndrome in Europe surmized there are 119,000 cases of FES a year, far higher than previous estimates. They based their new data on estimate that fully 1 in 4 European women drink during pregnancy, significantly higher than the previous 1 in 10 figure. The results were especially controversial, as they suggest that efforts in Europe to reduce drinking while pregnant weren’t working.

So what’s going on here? After nearly 200 years of accounting for death and disease, why is it so hard to come up with accurate numbers? Part of it is the sheer scale of the problem: every year, more than 56 million humans die — 150,000 a day, around the world. It’s far beyond our present capacity to account for each of those corpses individually. What’s more, autopsy rates are going down in the US and worldwide, making it harder to discern a true cause of death (a main cause of fewer autopsies: families refuse to let them happen. This is why, for instance, we don’t know the cause of death for Supreme Court Justice Antonin Scalia).

So the unsung heroes of epidemiology evaluate and approximate, and come up with quite reasonable guesses. Inevitably, though, their statistical extrapolations and creative deductions run into very human problems. Even in developed nations, people make mistakes; wrong diagnoses slip into hospital records and health care providers input incorrect codes. Sometimes the condition in question carries stigma; obesity, drinking during pregnancy, carrying an infectious disease (or a sexually transmitted one) — people can be disinclined to self-report these conditions, and providers can be leery to attach a label to loyal patients.

But this most rudimentary of riddles must be solved, because the question of who dies of what is among the most impactful of numbers, both culturally, politically, and scientifically. The closer we grope our way to the ground truth, the more we’ll understand the story of human health writ large — and the better we’ll be able to tell those 150,000 individual stories that end every day, in our midst.

Produced in partnership with Newco Shift.