COVID: Putting the Puzzle Together
Questioning the standard model of COVID’s trajectory, using original source papers.
Author’s Note: I’ve been paying very close attention to the COVID pandemic, ever since news started coming out of China about it in January. An epidemiologist friend told me early on, very confidently, that it would not be contained and would eventually reach the US, and the rest of the world. Since then, I have watched the news (and social media) coverage with great disappointment. Most of the coverage has focused on extremely surface-level analysis, and seems more concerned with generating clicks than on trying to make sense of what is actually happening. Highly inaccurate predictions and projections are made, usually with no followup, leaving most people who are paying attention in a constant state of alarm and panic. I have been fortunate enough to be good friends with several epidemiology and biology PhDs, who have continually pointed me towards relevant scientific literature (and their interpretations of it), instead of towards the news articles written by journalists who are not experts. In this post, I describe several observations from the data that are counter to the prevailing narrative, which is driven by a model with substantial flaws. I then describe important (and largely overlooked) research which explains these observations, and the higher-level implications of this research for the future trajectory of the pandemic.
Note: Death data is the best indicator of trend, because it is the least biased. Testing regimes change, and differ between different locations. Hospitalization data is harder to find, and is also confounded by testing protocols and resumption of elective procedures. Death data lags by 2–3 weeks, so it’s not great for the past few weeks, but if interpreting what happened before then, it’s your best option. Recent data suggests that the average age for cases is dropping, so going forward that will be important to keep in mind when evaluating data, but for historical data it shouldn’t be a problem.
Section 1: Important Questions No One is Asking
1. Why haven’t we seen second waves, post-lockdown in places which were at least moderately hit? And why have we seen pretty quick upticks in places that were only lightly hit?
Why have China (3 deaths/1M, but data from the CCP is pretty suspect; entered lockdown January 23, bars reopened April 9), Belgium (840 deaths/1M; entered lockdown March 18, bars reopened June 8) and Spain (606 deaths/1M; entered lockdown March 15, bars reopened June 6) (amongst many others) not had any second waves, despite being reopened for weeks (or longer), with seroprevalence nowhere near the believed herd immunity threshold of 60–75% [China is in the low single digits in early April; Belgium is at 6% in late April; Spain at 5% in late April]. Compare this to relatively quick upticks in places which came out of lockdown after having not had a large outbreak to begin with, like Arizona (218 deaths/1M; entered lockdown March 31, bars reopened May 16), Texas (83 deaths/1M; entered lockdown March 19, bars reopened May 18), and Florida (159 deaths/1M; entered lockdown April 1, bars reopened June 5).
Here’s the daily death data (since enough time has elapsed since re-opening to see the trend) for each of these places, with lockdown overlaid in yellow (determined by closing/reopening of bars, for consistency due to phased reopenings):
2. Why is Sweden past peak despite no lockdown or overwhelmed ICUs? And what does this imply for the model we’ve been using to predict impact?
Why does Sweden appear to be past peak, despite not locking down (and a no-mask culture)? And doesn’t the fact that they never overwhelmed ICUs show pretty clearly that Neil Ferguson’s model (which caused the US and the UK to go into lockdown) was wrong? His model only had projections for the US and the UK, and by their definitions Sweden was following a mitigation strategy, not a suppression strategy. While he didn’t provide projections for Sweden, below is his graph for UK ICU beds needed versus capacity (per 100k), with the best mitigation scenario (stricter than what Sweden is actually doing) peaking at needing 11x the available ICU beds. While we don’t know how many ICU beds he would have projected for Sweden, we do know that Sweden has never overwhelmed its ICUs, and that it’s unlikely that he would have projected 10x fewer beds needed in Sweden than in the UK. Furthermore, researchers at Sweden’s Uppsala University applied Ferguson’s methodology to Sweden, with Sweden-appropriate parameters, and estimated that if policies didn’t change (they didn’t), then by July 1, Sweden would see over 81,000 deaths. They went on to claim that if Sweden adopted more aggressive measures (like full lockdown), they would still see around 45,000 deaths. Today is June 24, and Sweden has had 5,200 COVID-associated deaths thus far (and 48,000 total (all cause) deaths thus far for all of 2020). So why is this model still influencing policy decisions anywhere, much less claims about lives saved?
3. Why have weeks of mass demonstrations (beginning five weeks ago) had no impact in places which were at least moderately hit, and appeared to be well past peak?
New York City and Seattle both had large, sustained racial justice demonstrations, and were both sites of large early COVID outbreaks, with few recent cases. Despite being outdoors with a lot of mask wearing, which one would expect to reduce transmission, but people were also pretty tightly packed and there was lots of chanting, talking and yelling. Given the density, and the vocalizations, one would expect at least some impact from them, even if the masks help. In both locations, the protests appear to have had no impact. In New York there is no uptick on any metric (cases, hospitalizations, or deaths), so it’s hard to argue that the protests caused an uptick. In Seattle, they have found the test positive rate of protesters to be just 1%, which is actually lower than the 2.3% average there.
4. Why is the household transmission rate so low (lower than the flu), given that this virus spreads like wildfire?
There have been quite a few studies estimating the household transmission rate for COVID, and they are consistently below the 38% rate for the flu: 11%, 17%, 17%, 30%. This is surprising, given how far and wide COVID has spread, and how quickly.
5. Why do kids appear to catch it at lower levels, and transmit it less, than adults?
And why is it that kids not only appear to get it less severely, but also catch it less often and transmit it less? Most serosurveys haven’t tested children, but the ones that have show them having significantly lower rates of possessing antibodies than the general population [Spain, Geneva, Belgium]. Study after study after study has shown that keeping schools open doesn’t lead to increased outbreaks. Other studies have shown that children are not a large source of transmission. One such study COVID-19 Transmission and Children: The Child is Not to Blame (Pediatrics; 2020) found:
all children <16 years of age diagnosed at Geneva University Hospital (N=40) underwent contact tracing to identify infected household contacts (HHC). Of 39 evaluable households, in only three (8%) was a child the suspected index case … Of 10 children hospitalized outside Wuhan, China, in only one was there possible child to adult transmission
The text cites example after example, the quote above is just a small sampling. Another study Age-dependent effects in the transmission and control of COVID-19 epidemic (Nature; 16 June 2020) found:
We estimate that susceptibility to infection in individuals under 20 years of age is approximately half that of adults aged over 20 years … we find that interventions aimed at children might have a relatively small impact on reducing SARS-CoV-2 transmission, particularly if the transmissibility of subclinical infections is low.
Anyone who’s ever hung out with kids, especially little kids, knows that they are designed for efficient spreading of germs and that this is a surprising result.
Section 2: Three Important Pieces of the Puzzle
There are three parts, which together explain a lot of this, all grounded in science and research.
1. The herd immunity threshold is lower when it’s from disease-acquired immunity instead of vaccination. It is likely also lower in less dense areas.
The popular understanding seems to be that we need 60–75% prevalence in order to reach herd immunity, which no place is near (other than maybe a couple towns in Italy). There have been two recent papers saying that the herd immunity threshold is lower when it’s based on acquired immunity (from catching the disease) rather than from vaccines. Marc Lipsitch gives a great summary of the articles here (he describes both as preprints, but one has now been published in Science; tl;dr super-spreaders catch it earlier and are then depleted earlier, so spread slows down pretty quickly):
…the most exposed/susceptible people in the population are more likely to be infected, and their infection is a bigger “hit” to the virus’s transmission because they were more efficient spreaders. So virus transmission disproportionately removes those most useful to it from contributing to future transmission (if they become immune). … the proportion that need to get effectively vaccinated at random in a population to achieve the “herd immunity threshold” remains 1–1/R0. Naturally acquired immunity can get away with less because it is naturally targeted — the high risk people matter most to transmission and are likely to get infected first. Vaccination (at random) doesn’t do that.
And then the papers themselves (they speak for themselves better than me trying to summarize):
A mathematical model reveals the influence of population heterogeneity on herd immunity to SARS-CoV-2 (Science; 23 Jun 2020):
We estimate that if R0 = 2.5 in an age-structured community with mixing rates fitted to social activity then the disease-induced herd immunity level can be around 43%, which is substantially less than the classical herd immunity level of 60% obtained through homogeneous immunization of the population.
Individual variation in susceptibility or exposure to SARS-CoV-2 lowers the herd immunity threshold (preprint):
heterogeneous populations require less infections to cross their herd immunity thresholds (HITs) than homogeneous (or not sufficiently heterogeneous) models would suggest. We integrate continuous distributions of susceptibility or connectivity in otherwise basic epidemic models for COVID-19 and show that as the coefficient of variation (CV) increases from 0 to 4, the herd immunity threshold declines from over 60% (4, 5) to less than 10%.
It’s worth noting that both of these papers rely on the heterogeneity of populations to reduce the herd immunity threshold. The other side of the coin is when you have extremely homogenous populations, especially densely packed homogenous populations, like in nursing homes (40% of US deaths, and 0.6% of US population), prisons and homeless shelters (in these cases, almost the whole facility will test positive at the same time, but with a very high asymptomatic rate), I believe that this can have the opposite effect and help explain why outbreaks in those types of locations can be so much more severe.
Another factor which impacts the herd immunity threshold for a given location is the number of contacts a person has, which is closely tied to density. We know that the classical definition for the herd immunity threshold (HIT) is HIT = 1-(1/R0), where R0 is the average number of new people a contagious person infects, at the start of the epidemic (before any susceptibles are depleted). If we dig a little deeper, we find that the equation for R0 is actually:
where S(0) is the initial fraction of the population that is susceptible, b is the average rate that an infected individual infects a susceptible person, and a is the recovery rate, which is found by dividing 1 by the number of days that an individual is contagious. b is the important variable here: if a contagious individual interacts with more people, then they will infect more people, and R0 will be higher. This means that in more dense areas, the R0 (and hence the herd immunity threshold) should be higher, due to the simple fact that people interact with more other people. I am reminded of this meme about Finland:
This intuition about density and herd immunity aligns with what we have seen. The worst outbreaks have been in urban areas with high density. I think that people don’t even question this because it’s so intuitive, but I haven’t seen any work formalizing this idea or exploring its implications with respect to COVID (though these slides do touch a little bit on some of these ideas).
2. Evidence suggests that many people have pre-existing cross-immunity from other coronaviruses.
We know that COVID is a coronavirus, as was SARS1, and as are the various “common colds.” We also know that there are likely other coronaviruses out there. Researchers are publishing studies (linked/described below) showing evidence for cross immunity with other coronaviruses. If true, it would mean that a lot of people have natural immunity. This wouldn’t show up in serosurveys, but could significantly reduce the pool of susceptibles. One important detail is that this cross-immunity is not binary; it’s not the case that you are either immune or not immune. We know that viral load matters. So, if you have pre-existing cross-immunity, it may well be the case that it will protect you from a light dose of the virus, but not from a heavy dose, or if you get a heavy dose you may not get as sick as you would otherwise. One unanswered question is if you have cross-immunity and then get exposed to a low dose of the virus, would you likely still test positive on a PCR test as an asymptomatic case, or not at all? The very high prevalence in the prison and homeless shelter studies above make me think that you would be likely to test positive but be asymptomatic. To my knowledge there are no studies yet on this exact question. But evidence for the general existence of cross-immunity keeps getting stronger. Onto the papers:
Targets of T Cell Responses to SARS-CoV-2 Coronavirus in Humans with COVID-19 Disease and Unexposed Individuals (Cell; 14 May 2020):
we detected SARS-CoV-2-reactive CD4+ T cells in ∼40%–60% of unexposed individuals, suggesting cross-reactive T cell recognition between circulating “common cold” coronaviruses and SARS-CoV-2
Different pattern of pre-existing SARS-COV-2 specific T cell immunity in SARS-recovered and uninfected individuals (preprint) [Update 7/15/2020: No longer a preprint, now published in Nature]:
We then show that SARS-recovered patients (n=23), 17 years after the 2003 outbreak, still possess long-lasting memory T cells … which displayed robust cross-reactivity to SARSCoV-2 NP. Surprisingly, we observed a differential pattern of SARS-CoV-2 specific T cell immunodominance in individuals with no history of SARS, COVID-19 or contact with SARS/COVID-19 patients (n=18). Half of them (9/18) possess T cells targeting the ORF-1 coded proteins NSP7 and 13, which were rarely detected in COVID-19- and SARS-recovered patients … Virus-naïve donors had a different pattern (more widely) reactive cells to NP and NSP, possibly indicating a better chance of aborting infection before it is established.
Presence of SARS-CoV-2 reactive T cells in COVID-19 patients and healthy donors (preprint) [Update 7/31/2020: No longer a preprint, now published in Nature]:
We demonstrate the presence of S-reactive CD4+ T cells in 83% of COVID-19 patients, as well as in 34% of SARS-CoV-2 seronegative healthy donors, albeit at lower frequencies.
SARS-CoV-2 T-cell epitopes define heterologous and COVID-19-induced T-cell recognition (preprint):
Cross-reactive SARS-CoV-2 T-cell epitopes revealed preexisting T-cell responses in 81% of unexposed individuals, and validation of similarity to common cold human coronaviruses provided a functional basis for postulated heterologous immunity in SARS-CoV-2 infection.
These papers in aggregate offer compelling evidence that a large fraction of the population with no exposure to SARS1 or COVID have some degree of pre-existing immunity,
3. Serosurveys are undercounting previous exposure.
Earlier work showed compelling evidence that large fractions of the population in the studied location may have already reached much higher prevalence than was believed at the time. A study from Oxford University posited that as much as half the country may have already been exposed (I think that this was an overestimate; they were trying to show that the observed data at that point had a wide range of explanations, including some more benign scenarios). A more recent article (I have worked directly with one of the authors, and found him to be an excellent scientist) Using influenza surveillance networks to estimate state-specific prevalence of SARS-CoV-2 in the United States (Science Translational Medicine; 22 June 2020) had the following result:
we show how influenza-like illness (ILI) outpatient surveillance data can be used to estimate the prevalence of SARS-CoV-2. We found a surge of non-influenza ILI above the seasonal average in March 2020 and showed that this surge correlated with COVID-19 case counts across states. If 1/3 of patients infected with SARS-CoV-2 in the US sought care, this ILI surge would have corresponded to more than 8.7 million new SARS-CoV-2 infections across the US during the three-week period from March 8 to March 28, 2020.
If correct, this would correspond to true prevalence being 80x what testing at the time showed (and is 5x what all testing in the US until now shows). These results caused their authors (and others) to begin pressing hard for serosurveys — the use of igG and igM antibody tests to determine how much of a population had ever been exposed to the virus.
As discussed above, when we started getting the results from serosurveys, the numbers were much lower than proponents of this theory likely expected. Since then, there have been several papers which offer strong evidence that the serosurveys are undercounting prevalence, and should really be viewed as a lower bound on the prevalence, rather than the true prevalence. There are two reasons for this:
- The antibody tests are calibrated to hospitalized patients who had severe cases, and antibody production peaks and then drops off. This means that for milder infections, and for infections too far in the past, the antibody tests often return false negatives.
- Some people can mount a response using other parts of their immune system and never produce igG and igM antibodies at all.
On to the actual papers:
Clinical and immunological assessment of asymptomatic SARS-CoV-2 infections (Nature Medicine; 18 June 2020):
Forty percent of asymptomatic individuals became seronegative and 12.9% of the symptomatic group became negative for IgG in the early convalescent phase.
We don’t know the proportion of people who are asymptomatic, but it is quite high. Probably at least 50%. The prison/homeless shelter studies I linked above had greater than 90% asymptomatic, and the Diamond Princess had a 50% asymptomatic despite being disproportionately full of older and comorbid passengers.
Intrafamilial Exposure to SARS-CoV-2 Induces Cellular Immune Response without Seroconversion (preprint):
Exposure to SARS-CoV-2 can induce virus-specific T cell responses without seroconversion. … Our results indicate that epidemiological data relying only on the detection of SARS-CoV-2 antibodies may lead to a substantial underestimation of prior exposure to the virus
Are SARS-CoV-2 seroprevalence estimates biased? (preprint):
Growing evidence suggests that asymptomatic and mild SARS-CoV-2 infections, together comprising >95% of all infections, may be associated with lower antibody titers than severe infections. In addition, antibody levels peak a few weeks after infection and decay gradually. Yet, positive controls used for determining the sensitivity of serological assays are usually limited to samples from hospitalized patients with severe disease, leading to what is commonly known as spectrum bias in estimating seroprevalence in the general population. … Our results suggest that assays with imperfect sensitivity will underestimate the true seroprevalence.
Systemic and mucosal antibody secretion specific to SARS-CoV-2 during mild versus severe COVID-19 (preprint):
15–20% of those who had no detectable [igG and igM] antibodies in their blood did have IgA antibodies in their mucosa, and that younger subjects were less likely to demonstrate systemic response.
Taken together, this is strong evidence that the serosurveys are substantially underestimating true prevalence.
Section 3: Putting the Pieces Together
Putting this all together, it’s now clear that an area can reach herd immunity despite low seroprevalence results due to
- the herd immunity threshold is lower than previously believed,
- some people have natural immunity, leaving fewer people susceptible, and
- the serosurveys underestimate true prevalence.
All three of these can vary in degree of impact between locations, and I don’t know how to weight their relative importance. Different populations have differing degrees of heterogeneity and pre-existing immunity. This is further confounded by different degrees of viral load due to population density, cultural norms, and other factors. Additionally, different countries’ infections peaked at different points in the past, meaning the amount of undercounting by serosurveys can also vary considerably. Much is still unknown, and undoubtedly there are more pieces of the puzzle still to be found. But let’s use this new information to go back and answer our original questions:
1. Why haven’t we seen second waves, post-lockdown in places which were at least moderately hit? And why have we seen pretty quick upticks in places that were only lightly hit? The locations with no post-lockdown uptick locked down too late to have any impact, and have likely reached herd immunity, or close to it. The lightly hit places still have many more susceptibles ready to be infected, and so will have a noticeable increase once lockdown is lifted. Unfortunately, the threshold (in deaths per million, or some other reasonable metric) for dividing between the two is unclear, so there are plenty of places in that middle ground where it is hard to predict what will happen.
I want to caveat this with the fact that I do believe that some behavior changes have impacted the spread, and some susceptible people self-quarantined early, so in a go-completely-back-to-the-old-normal scenario, I would expect a small uptick of cases as those susceptibles are infected, but nothing like a large second wave (unless a place hadn’t yet experienced a true first wave).
2. Why is Sweden past peak despite no lockdown or overwhelmed ICUs? They have also likely hit the herd immunity threshold. Serosurveys are underestimating their true prevalence. Because they are so naturally distanced (more than half of all Swedish homes are comprised of one person), the threshold they need to hit is likely lower than in many other places, which has helped them to keep their numbers down, despite not locking down.
3. Why have weeks of mass demonstrations (beginning five weeks ago) had no impact in places which were at least moderately hit, and appeared to be well past peak? These locations have, you guessed it, reached herd immunity.
4. Why is the household transmission rate so low (lower than the flu), given that this virus spreads like wildfire? It’s highly contagious (even more so than people currently think), but many people have natural immunity.
5. And why is it that kids not only appear to get it less severely, but also catch it less often and transmit it less? This one is a bit trickier. We know that kids get sick (in general) more often, especially little kids and common colds. So if they get cross immunity from those colds, it would make sense that they would have lower rates of COVID, because they would have higher rates of pre-existing immunity. We also know from above that kids are more likely to fight off COVID using igA antibodies (and other parts of their immune system), not the igG and igM antibodies that the serosurveys test for, so they would appear to have lower prevalence than they really do, due to the nature of the serosurvey. Finally, there is an additional biological reason (unrelated to everything above) why they get infected less: they have fewer ACE2 receptors, which are the means by which you actually become infected. Nasal ACE2 Levels and COVID-19 in Children (JAMA; 20 May 2020):
The nasal epithelium is one of the first sites of infection with SARS-CoV-2 … Among a cohort of 305 patients aged 4 to 60 years, older children (10–17 years old; n = 185), young adults (18–24 years old; n = 46), and adults (≥25 years old; n = 29) all had higher expression of ACE2 in the nasal epithelium compared with younger children (4–9 years old; n = 45), and ACE2 expression was higher with each subsequent age group after adjusting for sex and asthma.
Conclusion: COVID is real. It is not a hoax or a conspiracy theory. But, if you are extrapolating from death rates and seroprevalence from the worst hit locations, and the assumption that 60–75% of people will become infected before we acquire herd immunity, you will vastly overestimate the total (future) cost of this pandemic. This will likely lead to poor policy decisions whose (human) costs are not justified.
Appendix: Frequently Asked Questions
1. What about New York?
When people ask this question, they usually mean that NYC got hit pretty hard (0.25% of the total population has died, 21% of the population infected), so why shouldn’t we expect the same fatality rate elsewhere? NYC is unique due to its combination of density, subway system, and winter weather. Even if NYC had the same degree of cross-immunity as the Bay Area, we would expect higher rates of serious disease due to the fact that in the winter everyone is packed pretty closely together at home, work, and during commutes. This would lead to more/faster spread, and higher viral loads, which would diminish the impact of the #1/#2 above (#1 because we would expect higher “overshoot” and #2 because increased viral load would overpower some of the pre-existing immunity; plus higher viral load is just bad on its own regardless). Also, bad policy decisions have life-and-death impact.
2. What about Sweden?
When people ask this question, they usually mean why does Sweden have so many more COVID deaths than its Scandanavian neighbors, while also not being near herd immunity (according to serosurveys)? I’m just going to direct you to this twitter thread, which does a better job explaining than I could (and a recent update here). The tl;dr of it is that
- they count more “generously” than their neighbors, and
- they had 46 straight weeks of negative excess deaths, so lots of people who would have died from the flu (or other causes) in a normal year were spared, and then succumbed to COVID instead.
Sweden is now back to no excess deaths. My money is on Sweden’s overall excess deaths for 2020 being not bad overall, compared to its neighbors (and even more so if you did summer to summer instead of Jan 1 — Jan 1).
3. What about places like Bergamo that have 50% seroprevalence?
Given the variability in all three factors above, we can expect variability in outcomes as well. The (very few) places with high seroprevalence, likely just got the short end of the stick across the board: less natural cross-immunity (hence more susceptibles); more older people (more prone to severe cases, and hence make more antibodies, and hence fewer antibody test false negatives); conditions+culture that lead to higher viral load; and/or a more homogenous population, which leads to a higher herd immunity threshold.
4. Can seasonality explain any of this?
According to Harvard epidemiologist Marc Lipsitch, “it is not reasonable to expect [expected declines due to warmer summer weather] alone to slow transmission enough to make a big dent.” Also, the fact that we are seeing clear increases in many warm parts of the US (e.g. Arizona, Texas, Florida) is strong evidence against seasonality.
5. What does this mean for long term immunity?
Given that we’re six months in, and there are no confirmed reinfections yet, it seems like we must at least get immunity for that long. The summary article T cells found in COVID-19 patients ‘bode well’ for long-term immunity (Science Magazine, 14 May 2020) says:
“This is encouraging data,” says virologist Angela Rasmussen of Columbia University. Although the studies don’t clarify whether people who clear a SARS-CoV-2 infection can ward off the virus in the future, both identified strong T cell responses to it, which “bodes well for the development of long-term protective immunity,” Rasmussen says.
Personally, I am optimistic about long-term immunity, but it is still unproven.
UPDATE (7/4/20202): I wanted to clarify my thoughts here. I wouldn’t say that I am optimistic that having had it will give you lifelong immunity. What I think is that it will give you some natural immunity, similar to cross-immunity described above, where if you catch it again you end up with a much milder version. Because of this I do think it will become endemic, the way common colds are, but just won’t be nearly as bad as this initial pass through the population. This is in large part because older people will have already had it, so when they catch it again it will be more mild, and the people who’ve never been exposed before will mostly be young people, who we know have very good outcomes for the most part. Maybe something like another flu?
6. Who should I follow on twitter?
@Alex_Washburne, @inschool4life, @boriquagato, @AskeladdenTX, @mlipsitch, @MLevitt_NP2013, @BallouxFrancois, @NahasNewman, @strong_eric (friend who doesn’t tweet much but has curated a good set of people to follow), @eperlse, and probably others. I’m still very new to actually using twitter. But these people have all influenced my thinking, and almost everything in here I got from one of these people.
7. Who are you?
Definitely not someone who qualifies as an expert in either biology or epidemiology! But I do have a PhD from Stanford from a department with the word “Science” in it, and have spent my entire career using data to make decisions in one way or another, so maybe that counts for something? And I have a surprising (to me) number of friends with PhDs in biology and epidemiology, who have greatly impacted the way I think about this by sharing the science with me, rather than the news.