Data for the Next Pandemic


Three years into the COVID-19 pandemic, the medical, public health and disaster preparedness communities are trying to isolate lessons learned from the harrowing experience of a global outbreak that resulted in little shy of seven million deaths. There remains considerable disagreement as to what ought or ought not have been done, what worked and what did not and how to best approach a coming pandemic.

One of the only points on which there is widespread agreement regarding the global response to the pandemic is that data played a crucial role in tackling the crisis. Where data drove decision-making, outcomes were almost universally better in terms of morbidity and mortality.

In the preface to my recent book, Computational Modeling of Infectious Disease, I wrote:

Computational models of infectious disease can make all the difference in our response to pandemics. As habitat loss and climate change make zoonotic spillover events increasingly more likely, COVID-19 is almost certainly not the last major pan- demic of the 21st century. In fact, it is reasonable to assume that such outbreaks will become increasingly frequent. Computational models can be powerful weapons in our fight against pandemics.

The increasing availability and decreasing cost of computing resources means that large-scale computation has never been this affordable in human history. The average high school student’s graphing calculator packs more computing power than has been available for the first several decades of computer science. Massively distributed architectures can simulate populations of millions in agent-based models and examine the effectiveness of increasingly complex public health policies in a data-driven and evidence-based manner.

My work was greatly inspired by my experience on two COVID-19 data-related initiatives: The COVID-19 Tracking Project and Starschema’s COVID-19 Epidemiological Data Set. I wrote about the latter here:

While events like COVID-19 or the 1918–19 influenza pandemic (Spanish Flu) are thankfully once-in-a-century anomalies, epidemics are a fact of human life. It has been so for millennia, perhaps ever since the first larger conurbations sprung up in the Neolithic. Somewhere — perhaps in a tropical rainforest, perhaps in the thawing permafrost soil or maybe in one of our own cities — the next pathogen to try humanity’s resilience and resourcefulness is slowly emerging.

Despite the advances of modern medicine, the challenges of global epidemics have only become greater. Habitat loss of viral reservoir species increases the likelihood of zoonotic spillover events. Our global trade and transportation networks enable pathogens to make their way around the world in 24 hours. Climate change is disrupting fragile ecosystems and global poverty, especially urban poverty, exacerbates the problem.

Antibiotic overuse, especially in intensive animal farming, raises the risk of horizontal gene transfer of antibiotic resistance genes to other pathogens. The development of new antivirals has always been a slow, painstaking process, and antibiotic discovery is not keeping up with pathogens: in the last decade, fewer than 20 new antibiotics (and a single non-antibiotic biological, the monoclonal antibody obiltoxaximab) have been approved, and the pipeline looks quite bare.

If that sounds like bad news, there is reason for optimism, too. Data has made all the difference in this pandemic. Learn how we need to prepare for the next.Modern methods of infectious disease modeling can arm humanity against pathogenic threats:

  • There has never been a better time to create complex spatial models. Thanks to open data, most of the major roads and street networks across the world have been digitally mapped. In my book, I illustrate the utility of this by real-world use cases, such as determining the optimal site of a testing centre based on street networks in Manhattan or exploring the vulnerability of individual parts of Oxford by their distance from the nearest medical facilities. The resolution of the spatial models that we can now build is light-years ahead of what we have been used to.
  • Agent-based simulation, while computationally expensive, allows for deep modeling of complex human conduct. The attitudes, decisions and emotions that drive our destinies are complicated. They are not necessarily amenable to simple models. Simulated synthetic populations based on public data can create very accurate predictions.
  • In a paper jointly authored with my colleague Tamas Foldi in the early days of the pandemic, we argued for a global, data-driven early warning system for pandemic threats. The methods of anomaly detection are now sophisticated enough to bridge the ‘diagnostic gap’— the time difference between the emergence of a pandemic and detecting cases. Because anomaly detection can operate unsupervised, it does not require an exact case definition before it can pick up suspected cases.

All of this, of course, is premised on the availability of data — high quality, high quantity and high velocity data. Much of what data was available during COVID-19 was a patchwork of various data sets, and that made aggregated data sets like ours particularly useful for researchers, private sector operators and the public health establishment, too.

There is, as of yet, no global facility where data could be efficiently shared. The landscape resembles a number of roadside stalls more than a single, unified marketplace. In our practice, we routinely enable some of the world’s largest companies to share data using convenient, secure and well-architected data exchange/data marketplace architectures. There is no reason why the same could — or should — not be done for health data.

We will need it in the next pandemic.





Chris von Csefalvay CPH FRSPH MTOPRA
HCLTech-Starschema Blog

Practice director for biomedical AI at HCLTech, computational epidemiologist board certified in public health, Golden Retriever dad, &wheelchair rugby player.