When Coronavirus Data Becomes Lethal
No problem is so big that big data can’t solve, unless it isn’t clear how to or what is being solved. Relying on “the data” is one of the most problematic strategies leaders adopt in the midst of a pandemic, because the quality of the data is not clear. When bad data is used for any process - quantitative, qualitative, arbitrary - the right answers do not come out the other end.
‘The politicization of data’ is the practice of manipulating data for political gain.
Bad data takes lead decision makers — nations and corporations, consultants, even professors in non-medical fields — to push for the wrong policies.
Wanting to believe the pandemic had come to its end, some even declared early victory. “There Isn’t a Coronavirus ‘Second Wave,'” wrote former U.S. Vice President Michael R. Pence in June 2020 (1). And, the United Kingdom (2), which pushed economic recovery initiatives that accelerated a second wave.
One source of confusion is fatality rate. There is:
- Bewilderment around similarly, but appropriately-named metrics (Infection Fatality Rate vs. Case Fatality Rate)
- Numbers Shift During a Pandemic
- Not adapting metrics by neglecting newly available data (More Data Means More Useful IFR Models)
Infection Fatality Rate vs. Case Fatality Rate
Professionals who study outbreaks of diseases are epidemiologists, and they have two simple-to-understand metrics to understand risk of death. Infection Fatality Rate (IFR) — the proportion of persons infected who will die — and Case Fatality Rate (CFR).
The difference between IFR and CFR is asymptomatic - showing no symptoms - and undiagnosed cases are part of IFR.
- This means IFR will never be more, and is usually lower than CFR.
- CFR (including people admitted to hospitals) is a figure hospital management pays close attention to. IFR is more useful for policy makers, but both are necessary. Neither is a general-purpose, know-and-predict everything measurement for decision making.
- Some authorities confuse either or both figures, believing a disease is more or less lethal. Or, stop trusting either number.
Past American President Donald J. Trump’s attitude towards the outbreak is a clear example, but its equal may be the United Kingdom’s £849 million ‘Eat Out to Help Out’ program, driving people back to restaurants by subsidizing meals. University of Warwick research found it, “drove new infections up by 8% to 17% and accelerated a second wave in the fall.” (2)
Numbers Shift During a Pandemic
In the absence of an epidemic plan, many leaders rely on the data they have, placing greater weight on available measurements. IFR is both useful and troublesome in the midst of a pandemic. It has a major influence on policy decision, however, IFR (and CFR) is not conclusive until the end of an outbreak. The numbers are always changing because:
- The opportunity to detect cases is different throughout the outbreak. There is always a period of case detection before we know what we are dealing with, before clear case definitions and protocols are established.
- At the beginning of an outbreak, there is a wide range of figures between countries, due to different and inconsistent data gathering methods.
Predictions That Don’t Age Well and the People Who Fall for Them
Public health officials rarely comment on pandemic models because future outcomes that are built using lower context IFRs and metrics, or crude methods, lead to predictions that don’t age well.
Some come from sources with impressive, but irrelevant credentials, making clear the difference between being aware of the information, but not the context. In other words, they don’t know how to interpret the data. Whenever there’s a major opportunity or problem in the world, people like to board its hype train.
Including, radiologist Scott Atlas and special advisor to the White House Coronavirus Task Force, criticized by Stanford Medical School colleagues for “falsehoods and misrepresentations of science.”
In June 2020, Radio Taiwan International interviewed Yitzhak Ben-Israel, a Tel Aviv University security studies professor, whose April 2020 paper predicted COVID-19 stops spreading in 70 days, citing Taiwan as a proof. (3). A public health expert, Nadav Davidovitch, told The Times of Israel, “(Ben-Israel) is an excellent scientist, yet he has no clue about epidemiology and public health.”
The critical mistake of Ben-Israel’s self-called “simple math” model, to start, is these COVID-19 transmissions did not happen on the same day. Ben-Israel also ignores serial interval days — the sequence of transmissions and the time between them, as well as other timing factors.
The danger of this and other problems around exponential growth bias — underestimating compound growth processes—is that mis-information risk is shown to lower people’s willingness to comply with World Health Organization personal care recommendations. Including, wearing masks and using hand sanitizer (4). A false sense of safety make people less careful.
We earlier noted the quality of data changes (improves) during the lifecycle of an outbreak. Statistical analysis methods such as ‘censoring’ remove this lesser quality data to better pinpoint what researchers are trying to understand. In a pandemic, eliminating early cases that were not directly observed vs. observed cases, called left-censoring, is a logical boundary. Also, more manageable intervals to define exponential growth, such as time to double cases, instead of looking for attention-grabbing edge case results. (5)
Gathering and Interpreting Data
As it turns out, oversimplifying the situation has its hazards. More robust data is needed, though obtaining it presents additional challenges.
- Some places struggle to count their dead (6)
- Identifying COVID-19 is another issue. There may be no test at all, only a clinical algorithm (process) for diagnosing the disease. (7,8)
- To calculate a more accurate IFR, serological (blood) testing may be the most important method to detect population exposure, but it may not be able to be done in a timely way. (9,10)
- Some countries and regions have more advanced health systems than others. (11,12)
Additional Characteristics of COVID-19
Compared to other infectious respiratory illnesses, two things stand out about COVID-19. It has a large number of asymptomatic carriers, it is highly infectious, and it can resemble other illnesses.
- Overestimating case fatality rate. Monitoring, generally, is more focused on patients with symptoms and severe cases, so milder and asymptomatic cases are not detected. This is another reason IFR, which includes these cases, is a key figure.
- Underestimating case fatality rate. Some cases will not be accounted for before death, especially during epidemics that quickly spread.
- Misdiagnosed cases. Attribution to other disease with a similar clinical presentation, as was done at the beginning of the pandemic with influenza.
More Data Means More Useful IFR Models
Are certain types of people in certain places more likely to fall victim to coronavirus? With more data, a more useful IFR can be created. The formula becomes more nuanced once IFRs for certain groups are factored in (13).
Context is king, and there are other IFRs that are not created equal, including those incorporating factors that have to do with mass gatherings. This leads to “superspreader” events.
- Underlying medical conditions, called co-morbidities, may make affect (increase) IFR. (14,15)
- Regional considerations, like New York City (6), Singapore (16) and Hong Kong, where high population density increases transmission of respiratory diseases and the severity of an outbreak.
- Religion. (17)
- Ethnicity. (18)
- Climate. In colder places, there is more close contact, so in Japan, Hokkaido might look at IFR differently than Okinawa. Researchers found that ~74% of cases in the January 2020 initial wave of cases in the United Kingdom originated from travelers to Spain, France, and Italy. (19)
Science-based decision making, not data-based decision making
Effective policymaking requires consideration of these contextualities, to help leaders decide where and when medical resources are allocated. How to prioritize the needs of certain groups against others? Understanding how these groups are affected guide policy.
Policy must also have a purpose. For some leaders, it’s to speed up the end of an outbreak and begin an economic recovery. To the public, it could be ending lockdown. Healthcare policy makers have a different purpose. Often, it’s to reduce infection rates, and avoid overwhelming the healthcare system by limiting the number of cases.
Although clear and comparable figures can serve as a campfire for discussion, a shifting basis for comparison makes it all the easier to accidentally shape a metric into communicating something it was designed not to. When this is purposeful, it is called ‘the politicization of data,’ or, manipulating data for political gain. This may lead to other incorrect comparisons. As the United States and some other nations learned, when governments stop taking the science seriously (1), the next wave of infections begins.
Data is a tool. In the wrong hands, used in the wrong ways, it can be as dangerous as it is helpful. During a pandemic, the existence of data does not necessarily help organizations make faster, more informed decisions. Instead, an effective epidemic plan grounded in the certainties of science helps shepard societies through uncertain times.
So what can we rely on, if not, data? It’s not just to look at raw figures, but to monitor trends in those numbers. Said Dr. Anthony Fauci, director of the National Institute of Allergy and Infectious Diseases, “The virus makes its own timeline… You’ve got to go with what the situation on the ground is.”
- Subsidizing the spread of COVID-19: Evidence from the UK’s Eat-Out-to-Help-Out scheme. Thiemo Fetzer, CAGE Working Paper, University of Warwick
- Yitzhak Ben-Israel, Is the Corona Spread Exponential? Status 2020 Apr
- Banerjee R, Bhattacharya J, Majumdar P. Exponential Growth Prediction Bias and Compliance with the Safety Measures in the Times of COVID-19. IZA Institute of Labor Economics 2020 May
- Laura Lee Johnson, An Introduction to Survival Analysis, Principles and Practice of Clinical Research (Fourth Edition), Academic Press, 2018, 373–381, https://doi.org/10.1016/B978-0-12-849905-4.00026-5.
- Lau H, Khosrawipour T, Kocbach P, Ichii H, Bania J, Khosrawipour V. Evaluating the massive underreporting and undertesting of COVID-19 cases in multiple global epicenters. Pulmonology. 2020. doi:10.1016/j.pulmoe.2020.05.015
- Niehus R, De Salazar PM, Taylor AR, Lipsitch M. Using observational data to quantify bias of traveller-derived COVID-19 prevalence estimates in Wuhan, China. Lancet Infect Dis. 2020;20: 803–808.
- Metcalf CJE, Farrar J, Cutts FT, Basta NE, Graham AL, Lessler J, et al. Use of serological surveys to generate key insights into the changing global landscape of infectious disease. Lancet. 2016;388: 728–730.
- Kritsotakis E. On the Importance of Population-Based Serological Surveys of SARS-CoV-2 Without Overlooking Their Inherent Uncertainties. doi:10.20944/preprints202005.0194.v1.
- Kim G-U, Kim M-J, Ra SH, Lee J, Bae S, Jung J, et al. Clinical characteristics of asymptomatic and symptomatic patients with mild COVID-19. Clin Microbiol Infect. 2020;26: 948.e1–948.e3.
- Nishiura H, Kobayashi T, Miyama T, Suzuki A, Jung S-M, Hayashi K, et al. Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19). Int J Infect Dis. 2020;94: 154–155.
- Ghisoflfi S, Almas I, Sandefur JC, et al. Predicted COVID-19 fatality rates based on age, sex, comorbidities and health system capacity. BMJ Global Health 2020;5;e003094. doi:10.1136/bmjgh-2020–003094
- Gold MS, Sehayek D, Gabrielli S, Zhang X, McCusker C, Ben-Shoshan M. COVID-19 and comorbidities: a systematic review and meta-analysis. Postgrad Med. 2020; 1–7.
- Jain V, Yuan J-M. Predictive symptoms and comorbidities for severe COVID-19 and intensive care unit admission: a systematic review and meta-analysis. Int J Public Health. 2020;65: 533–546.
- Cummings CL, Kong WY, Orminski J. A typology of beliefs and misperceptions about the influenza disease and vaccine among older adults in Singapore. PLoS One. 2020 May 6;15(5):e0232472. doi: 10.1371/journal.pone.0232472. PMID: 32374754; PMCID: PMC7202625.
- Perez-Saez FJ, Lauer SA, Kaiser L, Regard S, Delaporte E, Guessous I, et al. Serology-informed estimates of SARS-CoV-2 infection fatality risk in Geneva, Switzerland. Lancet Infect Dis doi:10.1016/S1473–3099(20)30584–3
- Pan D, Sze S, Minhas JS, Bangash MN, Pareek N, Divall P, et al. The impact of ethnicity on clinical outcomes in COVID-19: A systematic review. EClinicalMedicine. 2020;23: 100404.
- du Plessis L, McCrone J, Zarebski A, et al. Establishment & lineage dynamics of the SARS-CoV-2 epidemic in the UK doi:10.1101/2020.1023.20218446