COVID-19: an attempt to estimate the true number of cases and the true mortality rate

Maarten
COVID-19: an attempt to estimate true numbers
9 min readMar 22, 2020

Following the news and updates around COVID-19 over the past few months, one thing became clear: we lack an awful lot of data in order to make conclusions. Wild estimates appear of the number of real cases, the mortality rate, etc., leaving much room for discussion. In this article we make an attempt to estimate the true number of cases and the true mortality rate based on several official data sources.

In short, we can draw the following conclusions:

  • The true mortality rate is much lower than often reported, and likely to be somewhere between 0.1% and 0.3%. This is both good and bad news. The good news is obvious, but the bad news is that the number of true cases is much higher than reported.
  • According to the methods below, the true number of cases in Italy is likely to be about 10% of the total population in Northern Italy. In total, the number of true cases in Italy is probably somewhere between 3 and 10 million, much higher than the 60.000 reported today.
  • Many other developed countries highly underestimate the number of true cases of COVID-19. In general: the more tests performed, the more accurate the number of true cases.

How can we explain the differences?

Let’s start with analyzing the differences. The differences originate mostly due to the differences in which countries test and report their numbers. While the UAE and South Korea have tested around 1% of their population, many other countries lack around 0.01 to 0.05% of their population (Japan, India, the US and many European countries). This means that over 99.9% of the people worldwide is not tested. We tend to draw conclusions based on the very skewed 0.1% of the total population, leading to underestimations of the number of real cases, but an over-estimation of the mortality rate and severity of the cases.

It’s important to get a grasp on the estimated cases in each region. Given this number, we can better estimate the mortality rate and distribution of severity of the symptoms. These statistics can be used to estimate the needed capacity in hospitals or the best strategy to deal with COVID-19. We can use models like the one presented by Tomas Pueyo to model and simulate future outcomes, given these parameters.

Let’s start with some important assumptions we use:

  • The number of people hospitalized and the number of deaths due to COVID-19 is recorded well in developed countries. We have quite a clear overview of the people with severe symptoms (intensive care or dead).
  • We underestimate the number of true cases of COVID-19 since we test only a fraction of the total population. The total of real cases is higher than we currently measure.
  • Mortality rate and the distribution of symptoms is identical over high-developed regions.

What can we learn from Italy?

Italy records numbers per region, which gives us very useful insights in the true number of cases. The Ministry of Health reports the number of cases, deaths and a distribution over the severity of the individual cases for every region (21). Below is the data as of the 21st of March as reported by the Italian Ministry of Health

Statistics from the Italian Ministery of Health

To summarize:

  • about 230.000 tests have been performed.
  • 53.578 people tested positive (23% of tests), 179.644 tested negative (77% of tests).
  • About half (52%) of the cases have only mild symptoms. The other half ended up in hospital or worse, died.
  • Currently 4.825 people died (almost 10% of the total cases!) and another 2.857 are on Intensive Care, of which about half is believed to die soon.

And other things we know, or don’t know:

What is interesting is the large differences among regions. Even regions in Northern Italy, with similar demographic characteristics which are very much interconnected report very different numbers. The questions is: what can we learn from these numbers? For example: Veneto tested more than 1% of its population, while Piemonte tested only 0.25% of its population. We can see that in general, if testing has been performed over a larger representation of the population (more tests with a lower ratio of positive test cases), more people were detected with only mild or even no symptoms.

Tests, positive cases and the effect and the average severity of cases

If we zoom in on the 4 regions in Northern Italy with more than 4 million inhabitants, we can compare results from which we can interpolate and even extrapolate to the whole population. For the sake of simplicity, assume that the outcome of COVID-19 is binary: either you’re severely ill and end up in hospital or worse (“severe cases”), or you only get mild to no symptoms and recover after a few days to weeks (“mild cases”). Furthermore, severe cases are always recorded, but mild cases are often missed by the officials.

Looking at the data, we find:

  • In Piemonte, 0.25% of the population is tested, of which 67% is hospitalized or died.
  • In Lombardia and Emilia Romagna, about 0.6% of the population is tested, of which almost 50% is hospitalized or died.
  • In Veneto, 1.1% of the population is tested, of which about 30% is hospitalized or died.

If we assume that the true distribution of severe and mild cases is similar over these regions, we can draw the following conclusions.

If Piemonte would have tested about 0.6% of their population (about 15.000 extra tests), we assume Piemonte would see a similar distribution over severe and mild cases as Lombardia and Emilia Romagna (50–50%). Hence, out of the additional 15.000 tests, we expect about 1.500 tests would turn positive with mild cases (about 10% of the total tests), resulting in a 50–50 distribution between mild and severe cases.

Scaling tests in Piemonte to the test size of Lombardia (0.6% of population)

Going further, if Piemonte would have tested about 1.1% of their population, the distribution of severe and mild cases is expected to be similar to Veneto, with about 70% mild cases. This would mean an additional 2.500 tests would turn positive, all mild cases (again, 10% of the tests would be positive).

Scaling tests in Piemonte to the test size of Veneto (1.1% of population)

Similarly, if Emilia Romagna would have performed about 25.000 extra tests, increasing the test ratio to 1.1% of the population, we would expect a similar distribution of mild and severe cases. This would mean about 2.500 mild cases would be detected, which is again 10% of the population.

About 10% of people in Northern Italy infected

The results above suggest that about 10% of Northern Italy has been infected by COVID-19. Now, like explained at the beginning, we’re only interested in the order of magnitude. It could very well be 7%, or 13%, but it’s unlikely to be below 1% or much higher than 20%. These findings are in line with smaller but more representative samples. For example, in the small town of Vò, everyone was tested in 2 consecutive rounds. During the first round, 89 people were tested positive (about 3% of the population), of which the majority had no, mild or asymptomatic symptoms. These tests were performed weeks ago, giving the chance to the virus the spread further in other regions of Italy which likely resulted in higher infection rates today than 3% weeks ago.

Another creative way of measuring the infection rate in Northern Italy is looking at Serie A players, which are closely monitored. At the moment of writing, 14 Serie A players tested positive of three different football clubs (Sampdoria, Juventus and AC Milan), which is about 7% of the selection players of the 10 Serie A clubs in Northern Italy, which is in line with the 10% as stated before.

Yet another example is the closed population of the Diamond Princess, with a reported mortality rate of 1.0%. Projecting the Diamond Princess mortality rate onto the age structure of the U.S. population, the death rate among people infected with Covid-19 would be 0.125%.

Mortality rate about 0.1 to 0.3% of true cases

Going back to Veneto: an infection rate of 10% means 500.000 people are infected (total population: 5 million). Again, this could very well be 300.000 or 700.000, but let’s assume 500.000. Of these cases, 146 people died, 249 are in critical conditions and another 942 are hospitalized with serious symptoms. This means that currently about 0.03% of the infected people died at the moment of writing, and another 1.200 (0.24%) are at risk, leading to a mortality rate in the order of 0.1% to 0.3% of the true cases.

Similar numbers in other countries

There are a few other countries with large test populations. For example South Korea tested 316.000 people, about 0.6% of their total population, of which 8.897 tested positive (about 3%). Of course, we have no idea about the status of the other 99.4%. If we follow the insights found at the village of Vò, where 70% of the cases were very mild, it’s very likely that only 30% of the real cases have been traced, and the real number of cases in South Korea is 3 to 4 times higher. This means that the true mortality rate would be somewhere around 0.3%.

The United Arab Emirates has the highest test rate, they tested 1.3% of their total population, leading to 153 positive cases and 2 deaths. Following the same logic as above, the true mortality rate lies somewhere around 0.3%, in line with South Korea and Italy.

Estimates of cases per country

The findings in this article, especially around the true mortality rate, can help to estimate the number of cases 2–3 weeks ago based on the number of new deaths today. Assuming a true death rate of 0.3%, and a decrease of spreading after social distancing measures, we would end up with the following numbers for several countries, following the logic and model presented by Tomas Pueyo.

Estimated cases and expected deaths per country today

Discussion

First of all, the insights above are just another way of looking at the 99.9% who didn’t get tested. It’s in no way the only truth, and I’m open to any suggestions, alterations and different theories.

Furthermore, even though the mortality rate is likely to be much lower than often reported, Northern Italy shows how hard it is to deal with an infection rate of “only” 10%. Letting the virus spread freely would likely cause a collapse of the healthcare system in any given country, even though 98–99% isn’t hospitalized and (in normal conditions) about 0.3% would die as a result of COVID-19. Furthermore, even though 0.3% sound like a fairly low number, let’s don’t forget this world has over 7 billion inhabitants. 0.3% of 7 billion still comes down to almost 20 million people.

What is interesting to further discuss is the distribution of symptoms over age. It seems that young people deal fairly well with the virus and are able to recover without much medical assistance (there are always exceptions). Since countries like the UK and the Netherlands are choosing for group immunity, rather than banishing the virus all together, a possible strategy could be to slowly infect the younger population and isolate the older population. Think of opening schools and letting the younger population allow to go to work, keeping the economy running and increase the infection rate. In this way, group immunity can slowly develop without a spike in severe cases. I leave this up for further discussion, but it would be interesting to discuss different strategies.

Image by politico.eu

--

--

Maarten
COVID-19: an attempt to estimate true numbers

ML Engineer, Data enthusiastic and Co-founder of two Dutch scale-ups: Deeploy: software to make AI explainable — and Enjins: building AI products for scale-ups