COVID 2020 — A data scientist perspective

Amitā Kapoor
Mar 18, 2020 · 6 min read

I am no virologist or an expert in Epidemiology — but a simple AI practitioner. And so without going into any further details will directly divulge into the data and the insights it offers.

The number of confirmed cases as of today 18th March 2020 have crossed 2 Lakhs i,e: 2,00,000. The image below shows these cases and how they are distributed along the world.

Image for post
Image for post
Total number of confirmed cases in every country

The graph above makes it clear China had maximum cases, next in line is Italy followed closely by Iran. The picture is indeed appears grim.

We data scientists love ‘normalization’, it helps us to see things in different scales together. Here also, each country has different population so instead of absolute numbers of confirmed cases, it makes more sense to see the confirmed cases as percentage of each countries population. The graph is below.

Image for post
Image for post
Total number of confirmed cases expressed as percentage of each countries population

We can see that in this case the total number of cases are less that 0.06% of entire population of a country. The worst effected now being Iceland! yep you read correct. That is a very small number. Indeed many whatsapp and Facebook posts going around claiming Corona to be nothing more than a media hype. Trying to convince that more people die of flu and hunger — than Corona will kill.

Do not conclude so hastily

“Ignorance is the parent of fear.”

Let us deshelve the ignorance and the resultant fear. And try data based rational approach. There are many mathematical models to study infectious diseases. The most used and quite simple is the SIR model. The name arises from the fact that it divides the population into three compartments:

* S Susceptible — the ones which can get infection.
* I Infected — the ones infected by the disease and actual carriers.
* R Recovered — the individuals who were infected earlier and now fully recovered (and hopefully have developed immunity to the infection).

Population shifts from Susceptible to Infected to Recovered. I will not bore you with the math behind it, those interested can refer to the link above and code here. Just remember the three compartments defined above, and three more important parameters defined below:

1. Contact Rate — It is a measure of how many persons one infected person can infect-depends on social circle- more social distancing — less is this number.
2. Mean Recovery Rate — It is the reciprocal of the number of days a person stays contagious.
3. Population — The total population.

When we start the whole population is susceptible, and when we finish the whole population is recovered. Below is a sample variation of population from susceptible → infected → recovered.

Image for post
Image for post
Variation of Suspected, Infected and Recovered with time according to the SIR model

Remember the model is simple and is not taking death into account. But in real life deaths happen. As we can see from the China data

Before doing the modeling for other countries, since we know that China reached its peak and then started decline, let us see its graph:

Image for post
Image for post
Active, recovered and deaths in China

The total population of China is 1404 millions, and total population of Wuhan which was the epicenter of this infection is 11 millions.

I see you are calculating only 60,000 from 11 million!! You are forgetting from 23rd January 2020 [Source] China implemented an extensive, stringent quarantine. Something I cannot imagine in my country even in Curfew/Emergency.

Another important thing one may ignore, 60,000 may be a small number from whole population, but they require health services, and China had to build Hospitals to provide health care.

Since the incubation period of Corona Virus can be from 4–28 days [Source], it is possible that there were many asymptomatic carriers in whom symptoms appeared later.

Now let us try and run SIR model in China data, with a initial population of 11 million, that is considering only Wuhan:

Image for post
Image for post
Suspected, Infected and Recovered in Wuhan as per SIR model

Well starting from December 2019, it peaked in mid-February (~ 60 days) and now is on decline. The same trend is visible in the modelled graph above.

So we did hit the Bulls eye with SIR model. Now what, remember the three numbers:
1. Contact Rate,
2. Recovery rate
3. Population

While recovery rate (on an average) will remain same for all countries, what changes is population and contact rate. For the above graph the Population was 11 millions, and contact rate 0.3.

Assuming a country with population of 1000 million becomes susceptible, then if follows the same level of quarantine as China — it will reach peak infection in 80 days and the whole cycle over in roughly 200 days.

Image for post
Image for post
A hypothetical country with 1000 million population, enforcing same level of Quarantine as China: SIR model

An important point overlooked, when talking in numbers is that at peak it will require roughly 75 million hospital beds (assuming 15% of total infected requiring hospitalization) — and may be 10–20% of these will require ICU beds. An in absence of them PEOPLE WILL DIE irrespective of age!! Yes, irrespective of the age- because when breathlessness strikes- hospitals are needed.

To ensure that you have understood what I am saying, let us take the case of Delhi, the capital of India. It has 20 million population. At present the number of cases in Delhi are restricted to 10 cases, there are local transmissions but no cluster transmissions yet reported. The population of Delhi is 20 million and it is host to 172 Hospitals, many of international repute. Assuming each one of the hospitals have a capacity of 1000 (I am indeed exaggerating 😃 ), thus we have 1,72,000 beds capacity.

Image for post
Image for post

If we follow even the same quarantine level as China, then at the peak stage of infection we will require 1.5 million hospital beds.

Social Distancing — the solution, But!

For Delhi, it stretches to slightly more than a year. For other countries depending upon the population, the time period may increase or decrease.

The life cannot and should not stop, thus, in my opinion- only social distancing will not be enough- it can create other unforeseen problems.

We will have to rethink how we work, how we commute and how we take care of the people who do not have the luxury to work from home — the healthcare professionals, the plumber, the maids, the safai karamcharis, the security guards…

And lastly would like to add these two cartoons from internet, I do not know the source but who ever made them — they are right on spot.

Image for post
Image for post
This pandemic might change more than we ever thought
Image for post
Image for post
May be Corona is the vaccine and we the virus

Additional Resources:


Peter Moss Leukemia AI Research

Free & Open-Source Technologies for the fight again Leukemia.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store