COVID-19 — from a data scientist’s perspective

Moritz Strube
6 min readMar 13, 2020

--

As a data scientist I asked myself which picture of the current and the future situation emerges from the publicly available data.

This scanning electron microscope image shows SARS-CoV-2 (round blue objects) emerging from the surface of cells cultured in the lab. SARS-CoV-2, also known as 2019-nCoV, is the virus that causes COVID-19. The virus shown was isolated from a patient in the U.S. Credit: NIAID-RML (Licensed under CC BY 2.0)

The SARS epidemic 2003

We witness the return of SARS which spread in the world 2003. An epidemic of SARS affected 26 countries and resulted in more than 8000 cases in 2003. I still remember the seriousness of the situation because I was part of a contingency planning team headed by the medical team of a vaccine production site (founded by Emil von Behring, who received the 1901 Nobel Prize in Physiology or Medicine, the first one awarded, for his discovery of a diphtheria antitoxin).

According to Marc Lipsitch, Professor of Epidemiology and Director, Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, spread of SARS was stopped by

extremely intense public health interventions in mainland Chinese cities, Hong Kong, Vietnam, Thailand, Canada and elsewhere. These involved isolating cases, quarantining their contacts, a measure of “social distancing,” and other intensive efforts. These worked well for SARS because those who were most infectious were also quite ill in a distinctive way — the sick cases were the transmitters, so isolating the sick curbed transmission. In Toronto, SARS resurged after the initial wave was controlled and precautions were discontinued. This resurgence was eventually linked to a case from the first wave. The resurgence confirms that it was control measures that stopped transmission the first time.

The return of SARS

We’re now witnessing the global spread of the novel coronavirus SARS-CoV-2, with more than 118,000 cases and 4,291 deaths of COVID19, the disease it causes, reported from 114 countries and characterized as a pandemic by the WHO.

Germany is witnessing a fast growing number of infections since February. While this led to empty shelves in the supermarkets beginning of March the general mood seemed to calm down.

Of course the situation became worse. Observers of the case number may not have noticed that the growth is exponential. While people in the tech industry always talk about exponential growth, the general public may find it hard to grasp. I highly recommend this video to understand what exponential growth means.

We have to expect a rapid growth of case numbers in Germany which may follow the case numbers of countries of Italy with only one or two weeks delay.

How bad is it?

I often heard people say that COVID-19 is not more dangerous than Influenza. And because we don’t really care for the seasonal flu waves it is also safe to ignore COVID-19. To be honest, I also believed this to be true until I read this article in the Economist which changed my mind. It says, that

25–70% of the population of any infected country may catch the disease. China’s experience suggests that, of the cases that are detected, roughly 80% will be mild, 15% will need treatment in hospital and 5% will require intensive care. Experts say that the virus may be five to ten times as lethal as seasonal flu, which, with a fatality rate of 0.1%, kills 60,000 Americans in a bad year. Across the world, the death toll could be in the millions.

I discovered some further unsettling differences between seasonal flu and COVID-19:

  • According to the WHO the mortality of COVID-19 is currently estimated to be 3.4%, compared to 0.1% for seasonal Influenza. This means that the mortality is possibly 30x higher.
  • Flu is seasonal, which means that infections decline massively with the end of the season. If this is also the case for COVID-19 is unclear.
  • For Influenza a vaccine is available (even if in Germany vaccination coverage doesn’t reach the WHO recommendations).
  • Many people globally have built up immunity to seasonal flu strains.

How bad it will be?

The predictions of the Economist draw a dark picture if you plug in the numbers. For Germany alone it predicts millions of hospital and intensive care treatments and hundreds of thousands of deaths if you assume that up to 70% of Germany’s population will be infected by the virus. Chancellor Merkel just confirmed that up to 70% of the population may be infected in a press conference. But it is not inevitable. The WHO Director-General announced:

We have never before seen a pandemic sparked by a coronavirus. This is the first pandemic caused by a coronavirus. And we have never before seen a pandemic that can be controlled, at the same time. […] We cannot say this loudly enough, or clearly enough, or often enough: all countries can still change the course of this pandemic.

Hopefully China shows successfully how to contain the spread a second time. The growth of the case numbers slowed down very much since the end of February. China seems to have reached the inflection point by reducing the exposure and probability of infection of exposed people with the drastic measures it put in place.

But in the short term it may make sense for Germany to look how the disease spreads in countries like Italy with a lead of 1–2 weeks. The following chart shows the number of cases in Italy and in Germany. It also compares cases in Italy with the cases in Germany delayed by eight days. I retrieved the data from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE available on Github.

COVID-19 cases as reported in 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE

The curve suggests that the growth in Germany is slower than in Italy — but in fact the cases increased in Italy from March 4th to March 11th by a factor of approx. 4 (from 3,089 to 12,462) while in the same period the number of cases in Germany increased by a factor of approx. 7 (from 262 to 1,908), based on this data. The faster growth becomes clearer in a plot which compares the logarithm of the cases (this video explains in detail what this means).

Logarithm of COVID-19 cases as reported in 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE

The charts are an indicator of what to expect in case numbers, but also possibly in regard to the impact of SARS-CoV-2 on public life and the economy in Germany. You can access the Colab notebook with which I downloaded the data and created the graphs on Google or Github.

We should prepare. I recommend to have a look at the WHO Director-General’s opening remarks at the media briefing on COVID-19 and the WHO recommendations. The WHO also provides a Getting your workplace ready for COVID-19 guide. Beware of some dubious viral posts with advice.

If you want to learn more about epidemics and their dynamics I highly recommend the video Exponential growth and epidemics by 3Blue1Brown to get a quick understanding and the MOOC Epidemics — the Dynamics of Infectious Diseases by Coursera to gain in-depth knowledge. And maybe you wonder what happens after the virus? Be careful and stay healthy!

About data sources

The data sources report different numbers. I decided to use the data published by John Hopkins CSSE. This example shows how the case numbers reported for Germany for March 12th differ:

Unfortunately time series are rarely easily available. Time series help us to understand the growth parameters and allow predictions.

About mortality

The great differences between reported mortality— defined as the number of reported deaths divided by the reported cases — may have different causes:

  • number of unknown cases
  • test coverage and procedure
  • demographics
  • quality of hospital care
  • number of reported cases and speed of transmission
  • mutation/strain of the virus
  • other factors

--

--

Moritz Strube

I’m an experienced technology manager with deep interest in technology, innovation and science. https://www.linkedin.com/in/moritz-strube/