No, Epidemics do not stop by Magic
Or How to minimise the number of deaths and hospitalisations from COVID-19
- The real number of infected cases is, for most countries, between 4 and 30 times higher than the number of detected cases (for some countries like Italy and France it’s probably even more)
- We need the help of population (herd) immunity to stop this — we are unable to stop it without
- We can achieve that in a more or less efficient way
- The most efficient is to let the young and healthy infect almost as quickly as possible (during a period of 2 months), while protecting (isolating) the vulnerable
- This approach saves both lives and the economy
- We need a good model of the situation to be able to make good decisions
Like many of us, I was skeptical at first
When our PM announced the closure of schools on March 11th, when we only had 25 diagnosed cases in the country, I thought they were overreacting — so I started to model this situation to better understand what was going on…
So how a person models this thing?
First, I downloaded all available data on the number of confirmed cases and deaths from the Chinese provinces and all other countries, and started to chart it in all kinds of different ways. Here is how it looked:
Data for Hubei looked bizarre, there was that big jump due to a change in methodology, and then the quick deceleration — this didn’t look consistent nor helpful, but the data from other provinces looked very consistent:
So I tried to extract more out of it — the numbers of active cases:
The daily growth rates:
Yes, real life data is messy, but analyzing it allowed to find the mechanism that seemed to govern it. The famous post by Tomas Pueyo from March 10th was also very helpful here, especially the following chart:
So this allowed to understand that there is an approximately 8 day lag between when people get infected and when they get diagnosed; these people infect others during that time, before they get diagnosed and isolated.
Those numbers of days will get calibrated more finely in the model, by comparing its results with real observed data from different countries.
So given this, and the fact that we have daily data, it is possible to reverse engineer the process by which this epidemic spreads: each day we have a number of people in a particular region who are infected and who are infecting others during a certain number of days. Then, a certain proportion of those people never get diagnosed and combat the infection with mild symptoms or even without, while others are diagnosed after a certain number of days (between 6 and 9 for most of them).
This leads us to the main parameter that we can call the “daily infectiousness” — the average number of people that an infected person infects during a day.
And after multiplying this by the average number of days during which a person infects, we get a first order approximation of the average number of people infected by each infected person — this number is called R — when R is above 1 then the epidemic accelerates, when it is below 1 then it slows down.
For the following charts we’ll assume that the average number of days during which a person infects is constant and equal to 8 — this means that R=1 corresponds to a daily infectiousness of 1/8=12.5% — this number comes not only from the above chart, but, more importantly, this assumption allows us to match the observed data in different countries.
So this constitutes our first basic model, that allowed to account quite precisely for the observed data for most countries just by changing the level of daily infectiousness on the days when policy changes were implemented (travel bans, school and restaurants closures, confinements...).
Most countries were still in the very early stages back then, though South Korea was already slowing down noticeably:
The basic model was already able to account for the data already observed, to predict the growth of the number of reported cases for the following week quite precisely, and to calculate the growth after that for any given daily infectiousness. This allowed me to publish the following charts on March 14th, by using the optimistic assumption that we had managed to reduce the daily infectiousness to 10% from March 12th, which was when restrictions were implemented in many countries:
Unfortunately, reality didn’t correspond to these optimistic assumptions and was much worse.
But those charts necessitate a few more explanations and distinctions:
- what the charts and the data are showing are not the numbers of infected cases — many cases go undetected and only a certain proportion of cases are diagnosed — we’ll assume for now that this proportion is constant for each country (though different from country to country)
- and then the cases that are to be detected, will only get diagnosed after 6 to 9 days after infection
- so the numbers that we see reported in the media and on the charts above are not the numbers of infected, but a certain percentage of the number of infected people from about a week before…
- and that percentage is actually quite low (anywhere between 3% and 50%, depending on the country, meaning that the real numbers of cases are between 2 and over 30 times higher than the reported numbers — we’ll come back to that) and can be estimated using the proportion of test results that come out positive and the observed mortality rates (see Technical Appendix №1 below)
What we can see on the above charts is how the epidemic slows down when R is below 1 (0.8 has been used for the charts, corresponding to a daily infectiousness of 10%, i.e. one infected person infects on average 0.1 new persons per day, during 8 days) — by that time (mid-March) this had been achieved in China and Korea, by using very different means. They actually managed to reach R of about 0.5, so we could hope back then, that western countries would also quickly manage to reach R<1 and squash this thing with numbers of infected in the thousands, not millions.
Governments had a number of different measures at their disposal to achieve that — again the chart from the second article by Tomas Pueyo is very informative here:
These measures should be applied in the order of decreasing effectiveness relative to their cost, and those costs vary by whole orders of magnitude, with mandatory closures, bans and injuctions being the most costly (see Technical Appendix №2 below).
But then data kept coming in every day and showing that western countries were unable to get R below 1:
Yes, this data is messy again, but it shows clearly that daily infectiousness and R were going down in most countries as additional measures were being implemented, but not enough as to go durably below R=1, which is necessary to stop the epidemic, like it was achieved for Ebola.
And even if some countries managed to stay slightly below R=1 for a few days, like Austria for example, this was achieved by taking extraordinary measures that couldn’t be kept indefinitely, so as soon as they would be relaxed and normal life would resume, R would go back above 1 and the epidemic would accelerate again…
Also R around 1 observed for an entire country could be masking the fact that it is actually already below 1 in certain areas thanks to herd immunity, while it is still above 1 in places where herd immunity hasn’t been achieved yet, as appeared to be the case when looking closely at the data from Italian regions.
And so it became clear that western countries wouldn’t be able to stop this without some form of herd immunity…
It works like this: when we have 1000 infected people and R=3, then they infect 3000 new people and it accelerates, but if 20% of the people are already immune, then only 2400 out of those 3000 will get infected, and so if 70% of the people are immune, then our first thousand will infect just 900 people, who in turn will infect around 810 people etc… So it decelerates, and after a few weeks the outbreak ends…
After building that mechanism into the first model and accounting for the proportion of cases that go undetected, we obtain charts that look like this:
And by slowing down the process, through social distancing for example, it is possible to “flatten the curve”, which is not to be confused with the quick nipping in the bud that we had considered before:
The main difference between the previous charts and these is that we no longer stop this epidemic with thousands of cases, but millions…
But hence we need to build that immunity, i.e. infect and cure a certain proportion of the population so that R is below 1, then we can do it in a more or less efficiently manner…
Optimising Herd Immunity
So far we have used averages across the population and treated the whole population as if it was homogeneous, while in fact there are large differences between people in both mortality and infectiousness.
The Elderly and people with weakened immune systems have a much higher mortality than the young and healthy.
And we also know that there are people who spread the virus much more than others — they are called the super-spreaders. This can be due to physiological reasons, but mainly it is due to being active and meeting a lot of people. Something that is likely positively correlated with being young and healthy…
Hence the idea to consider the following three groups:
- the Elderly and people with weakened immune systems — between 10 and 20 percent of the population
- those among the young and healthy who wouldn’t mind being infected and who are also most likely to be spreading the virus more than others — between 20 and 30 percent of the population
- and the remaining 50 to 70 percent
To model the spreading of the epidemic among those groups, what we now need is not one daily infectiousness number, but instead a 3 by 3 matrix, that represents the level of interactions of people within each group and between the groups (see Technical Appendix №3 below).
Also we don’t know who the super-spreaders are, but we can assume that people from group number 2, being the most socially active, will be spreading the virus a certain number of times more than the rest — so they will be contributing most to R (see Technical Appendix №4 below).
After taking some reasonable assumptions about the above parameters, the model that follows gives us the following results:
- if we continue with fairly uniform social distancing for a period of 2 months, after which we relax the measures by necessity (visible as the jump on the charts below), which may lead to a peak of new infections after the measures have been relaxed:
- or instead, if we isolate Group 1 (the people most at risk) for two months, which is modeled here as a level of interactions between them and the rest divided by 10, and during that time we ask the volunteering young and healthy to interact just as before, or even more:
The difference in mortality is striking.
And herd immunity can thus be achieved after a much smaller proportion of the population having been infected, maybe as low as 30%, as the immunity would be concentrated on those who contribute the most to R and are young and healthy — the healthy super-spreaders, with a mortality rate below 0.1% (or maybe even less, when using drugs like the Hydroxychloroquine, as practised by Professor Raoult from Marseille) — and the additional benefit is that it could be achieved relatively quickly, and without closing the economy…
Those young and healthy who would be willing to get infected would be our heroes and they should be visible (wearing a red bracelet for example), so that they know who they are, and try to interact even more among their group, so that they almost all infect and gain immunity during those 2 months. Yes, a few of them might die, but only a very small percentage — much less than 1 in a thousand, data shows.
During that time the rest continue to work as usual and only try to avoid the people with the red bracelets (they can be wearing masks for example) — about 20% of them might also get infected, but this is much less than in the alternative…
And after 2 months we can all come back to our normal lives, after having gained herd immunity with a minimal number of casualties, and not having killed the economy either…
Instead of this, most western countries are effectively trying the first strategy— by trying to uniformly slow down the spreading of the virus through lockdowns — the result of which will be that a certain proportion of the population will get infected anyway, but it will take more time, during which it will spread more evenly, which means that a higher proportion of the population will need to get infected in the end (around 50% or more), but most importantly it will touch more Elderly and vulnerable people, hence a much higher mortality…
So this is not lives versus the economy — the second strategy allows us to save both!
What is almost as important as minimising the number deaths, is not to exceed the capacity of hospitals and ICUs, but here again, when using the Imperial College data on percentages of hospitalised and ICU cases by age (table on the left), we also find that Strategy 1 is much worse:
- Number of hospital an ICU beds needed for Strategy 1:
- Number of hospital an ICU beds needed for Strategy 2:
The number of required ICU beds at the peak is five times lower for Strategy 2 (isolation, synchronised with accelerated immunisation of the young and healthy volunteers). The second, lower peak on the chart above corresponds to a certain number of cases among the vulnerable population after the period of isolation, but in reality it could most probably be avoided, by extending the isolation for a few more days, or by relaxing it progressively…
We basically have two ways to end an epidemic: either by eradicating it completely, which showed impossible in the case of COVID-19, or by creating enough immunity among the population, so that the resulting R parameter falls under 1, which leads to a situation where any outbreak doesn’t spiral out of control, but ends very quickly.
And this population immunity, called herd immunity, can come either from vaccinating a large proportion of the population, or from having them develop natural immunity, by being infected and cured.
But this can be achieved more or less efficiently: if we manage to have the young and healthy go through the infection, while protecting the vulnerable, then we can achieve this population immunity with a minimal number of casualties. If on top of that we can manage to have as many as possible of the super-spreaders among the infected, then the immunity can be reached with a minimal number of infected, probably as low as 30% of the population (see Technical Appendix №5)— and if that 30% is also young and healthy, then the overall population death rate could be lower than 0.01% upon reaching that immunity. (An example of a plan to achieve that can be found in Technical Appendix №7 below)
Many countries have already exceeded that death rate of 100 per 1 million people, and this is a terrible, terrible failure…
…caused by politicians most likely doing what “sounds good” to them, while having no idea about what is going on — and having an idea means having a good, consistent model… (see Technical Appendix №6 below)
And as we have seen, what “sounds good” can be really bad in reality…
(important updates can be found at the bottom below)
Technical Appendix № 1 — the Proportion of Detected Cases
Countries that have tested more than 2% of their population have an observed death rate below 1,5% (blue area on graph №1 below — data as of 06/04).
Most of them have an observed death rate of around 0,5%, and Iceland, that has tested the most (over 8% of the population), has an observed death rate of 0,4% — after adjusting for the fact that deaths occur on average a few days after diagnosis, this gives us our best estimate of the death rate in an infected population — 0,5% — though it is not impossible that it actually is even lower, with even more completely undetected asymptomatic cases among the young — only representative sample antibody studies will be able to tell us that.
(representative sample studies done in Germany have since confirmed that death rate of around 0,4–0,5%; also the excess number of deaths in the area of Bergamo was around 0,5%, meaning that close to everyone got infected there…)
Countries with an observed death rate above 4% have all tested less than 0,8% of their population (graph №1 yellow area), except for Italy, that has also tested only 1,2% of the population — their observed death rates are not reliable as a death rate for the infected population (i.e. many infected people haven’t been tested)
Furthermore, the observed death rate by country is highly correlated (about 0,75) with the percentage of tests that came out positive — this is as expected: a large percentage of positive test results means that the testing criteria are narrow, which means that many infected people will not be tested and detected, which increases the death rate among detected cases (graph №2 below) — so this high correlation confirms that a high proportion of positive test results and a high observed death rate are largely due to the same reason: a high proportion of cases that go undetected — this allows us to approximate that proportion.
The middle line has a slope of 0,36.
When taking the population death rate of 0,5% as a reference, this relationship gives us a plausible approximation of the proportion of infected people that have been detected in each country — for example for Poland:
5,2% of positive test results gives us an expected observed death rate of 0,36*5,2%=1,9% (and indeed the actual rate is 2,4%), which confirms that this is due to a proportion of infected cases that have been detected of around 25% (0,5%/1,9%).
Finally, a third way that leads to similar results is to compare the age distribution of detected cases between a country testing a representative sample of people, and other countries that mainly test people with symptoms:
When assuming that the real distribution is the same, but the latter country only discovered a large proportion of the cases among the Elderly (above 80 years old) and hence a very small proportion of the cases among the young, then again we get real numbers of infected cases up to 30 times higher than the number of detected ones (or even more, for countries like Italy, France and Belgium).
What this means additionally is that we currently (as of 06/04) already have about 5 million people that have been infected in the US (between 2,5 and 8 million in fact)
Around 3 million in France and Spain
Over 2 million in Italy etc…
EDIT Apr 16: A study in Holland has just shown that 3% of their population already have antibodies for this virus, which means they have already gone through an infection, which is fully consistent with the predictions of this model: that for Holland the number of infected is about 16 times higher than the number of detected cases.
Technical Appendix № 2— the Different Measures
We will no doubt see countless PhD theses evaluating post factum the cost and effectiveness of the different measures that were and could have been implemented.
But to evaluate that in real-time was very difficult, as it would involve inverting the matrix of the different measures implemented in different countries, while the data on the results was inexistant or very choppy at best.
Also when all this is over, we will have valuable data on the level of discipline and social promiscuity in different societies.
What we can say nevertheless is that the measures used by Korea and Taiwan worked best, while not being very costly, compared to what most western countries have done.
Technical Appendix № 3— the Matrix of Interactions Between Groups
First of all Thank you to Prof. Siddhartha Mishra from the ETH in Zürich for having checked and confirmed my methodology:
very reasonable assumptions on the interaction matrix, I agree with you that the results are correct
The 3 by 3 matrix of interactions/infections between Group 1 (Elderly and vulnerable — 12% of population), Group 2 (Young and heathy, and active — 20% of population) and Group 3 (the rest), that I used is this:
8% ; 4% ; 1%
4% ; 45%; 8%
6% ; 51%; 11%
which reflects that:
- there are as many meetings of persons from group A with persons from group B, as there are meetings of persons from group B with persons from group A (symmetricity)
- people have most interactions within their groups
- people from Group 2 have on average 3 times more interactions than the others and infect about 50% more at each interaction, which leads to about 5 times more infections than for the rest — this is a fairly strong assumption, but the qualitative results stay the same if that number is lower (more on that in Technical Appendix №4 below)
- after which it is adjusted by the relative number of people in each group
To get the final levels of infectiousness between the groups, the matrix is adjusted by a factor, so as to match the observed initial overall level of infectiousness in the society, before any measures had been undertaken.
Uniform social distancing is modeled as a factor applied to the whole matrix, while the isolation of a group and more or less interaction between groups are modeled by applying a factor to the corresponding term.
Technical Appendix № 4— the Super-Spreaders
We don’t really know what the distribution of infectiousness really looks like — to quote Prof. Mishra:
as long as this is a power law, your assumption will be valid with some \alpha >> 1 — I am not sure it is — if the infection is spread, say sexually like AIDS — then such a distribution is probably true — however for a respiratory infection such as COV19, a plausible distribution would be a 1-sided truncated Gaussian — the superspreaders are there (on the tail of the distribution) but there are far too few of them
but this doesn’t change the fact that we can assume that the young and healthy are also the most socially active, and so they will be spreading the virus more than others.
Technical Appendix № 5 — on Approaching Herd Immunity
What appears when running the models, is that the moment when it actually does make sense to reduce social interactions (by social distancing or otherwise) is just a few days before reaching R<1.
As when this is not done, what happens is that R=1 is reached when there is a maximum number of people who are infected and infecting others, and so, before it all dies down, they will still infect a large number of people who are unnecessary for the reaching of herd immunity…
While slowing down the spreading of the virus a few days before going under R=1 allows for a sort of soft landing, finishing just under R=1.
When implementing this deceleration, the corresponding hospitalisation and ICU charts look like this (the peaks are even lower):
In practice this would require watching the data very closely day after day, ideally by doing representative sample antibody tests every two or three days…
Technical Appendix № 6— on Modeling
Reality is infinitely complex — at every moment there is a multitude of dynamics at play and no model can account for them all. So the goal is to find the most relevant mechanisms, so that a model has the most predictive power and aligns best with observations and reality — that is when we can consider it to be good, never perfect, but definitely better than when it makes “perfect sense”, but doesn’t calibrate well with reality…
Here we could additionally model the effect of children attending schools, the effect of infections in hospitals etc., but given that we have no data that would allow us to verify the effect of incorporating those dynamics against reality, we have to limit the number of degrees of freedom of the model to the ones we judge as being the most relevant.
As anyway the number of possible dynamics that we could model is infinite, and then there is the effect of the weather (it is not impossible that the summer will reduce the infectiousness and get R temporarily below 1, only to see a second wave of the epidemic in the autumn), and then there are the unknown unknowns — so we just don’t know what really happens, but we can approximate it at best we can…
Technical Appendix № 7— A Grand National Immunity Building Program
First, we need to allow the 5 million Elderly and vulnerable to isolate at home, for two months, and to organise food delivery to their doorstep — yes, this will cost: about 100 million per day — but the current situation costs us about 2bln per day! (in PLN, for Poland)
Then we need to ask 20% of the population to self-identify and come forward as the young, healthy super-spreaders (them, and those who live with them) — they not only likely spread the virus more than others, but also are least likely to die or even get hospitalised — they will be our heroes — we give them a red bracelet, so that they know who they are, and they not only continue business as usual, but try to interact even more among their group, so that they almost all infect and gain immunity during those 2 months. Yes, a few of them might die, but only a very small percentage — less than 1 in a thousand, data shows.
During that time the rest continue to work as usual and only try to avoid the people with the red bracelets — about 20% of them might also get infected, but this is much less than in the alternative…
After 2 months we can all come back to our normal lives, after having gained herd immunity with a minimal number of casualties, and not having killed the economy either…
Technical Appendix № 8— on the Economic Impact
The above analysis didn’t concentrate on the economic and financial aspects of the current crisis. It is worth mentioning though, that most forecasts and analyses of those focus on the linear effects, like the decrease in GDP etc., while the unprecedented magnitude of this crisis, due to the closure of entire countries and sectors of the economy, will lead above all to non-linear effects, that are inherently chaotic and unpredictable… Trigger effects could lead to the collapse of entire systems, so we will need to come back to the basics of what constitutes value, and to rebuild that…
Apr 30 — Updates:
- The above modeling, for the sake of not diluting the main message, didn’t take into account the following geographical aspects:
- whenever the virus appears on any territory (a country, a region, a city) it will tend to develop fastest in places where the density of population is highest (Lombardy, Paris, New York), hence what will appear to be its initial R0 is the R0 of these most densely populated areas (with their open-space working areas, high density dormitories like in Singapore etc.), and the R0 in the countryside or in smaller cities would actually be smaller…
- what follows is that even if we are probably unable to get R below 1 while maintaining normal lives in those high density places, it may well be possible to keep it below 1 without a large proportion of people being immunised in the lower density places…
- so this means that after having achieved population immunity in the large cities, this may actually be enough to have immunity for the whole population, which may further decrease the percentage of people with immunity needed for population (herd) immunity — and so this is what should be done first, when applying the solution described above…
- this is actually just an extension of the fact that the contribution of different people to R is far from uniform, and so the people who spread the virus more should gain immunity first to achieve population immunity most efficiently — and so this would be the young, active people in large cities — and maybe just having them go through the infection would be enough to stop the epidemic and keep everyone else safe…
- which is what would happen naturally anyway (as those whose immunity will stop the spreading of the virus best would also be the ones who would catch it first for the most part) — and by unnaturally changing the social dynamics we are preventing that from happening most efficiently…
2. Another aspect that has been raised is the durability of the immunity — even if there is not much of a basis to consider it should be any different than for the 2003 SARS-CoV, and hence lasting for at least a few years. But even if this is not the case, then there is going to be some kind of a distribution of how long it lasts in different people. So even if after reaching population immunity, some people’s immunity ends, then what this will lead to, is that for a short period R might get back slightly above 1 and a few more people will get infected, so that it gets back below 1. But in any case this would not lead to another outbreak of an epidemic — plus this is the same as what would happen with an immunity coming from a vaccine…
3. Just as before, in both aspects above we can see that the way a population naturally deals with such situations (with the more vulnerable protecting themselves more, and the more active and less vulnerable getting infected in a larger proportion, until population immunity is reached) is pretty much the best way — and this is no accident — societies have developed ways to survive best, that are much wiser than the brutalist measures forced on us by decision-makers who can’t even count or listen…
What we found is that the natural ways through which a society deals with such situations can only be slightly improved, through good information and some enhancement to the natural processes of protecting the vulnerable, plus maybe slowing down the spreading of the virus only when the population immunity is close to being reached (as explained in Appendix №5), so as not to overshoot unnecessarily…
Polish translation (unchecked):