Covid-19 Math Executive Summary

Anthony Bruce
14 min readSep 4, 2020

--

My business is data. I have built my career working with management teams across many industries (experts in their fields) to help them understand data, often helping those teams correct misinterpretations about data in their own businesses.

My concern and motivation for writing is this: most of the US population have developed perceptions about COVID-19 that are not consistent with the data.

There are three types of mathematical challenges in understanding data running rampant:

1. Misinterpretations of what terms mean, particularly “cases”. Many have failed to understand that the number of “cases” ≠ the number of infections, leading to many incorrect beliefs.

2. Failures to put numbers in context either through providing simple comparisons or by reporting meaningful ratios (e.g., relative deaths in different age groups or with different medical conditions compared to the population).

3. Shortcomings in sufficiently or properly highlighting patterns in the data. Some leaders in medicine and epidemiology have insufficiently focused on interpreting key patterns in the data.

A more complete understanding of the data leads one to conclude that the worst is over in the US unless we observe widespread re-infections, which appears unlikely in the medium-term.

Four Important but Commonly Misinterpreted Math Facts About COVID

  1. Math Fact 1: Many people in the US are consistently drawing inaccurate conclusions from looking at “cases”, incorrectly believing that the US infection peak was in July not March, that the average risk of death is 3% (deaths / cases) vs. 0.4–0.5% overall and ~1 in 1,000 for those under 65, that dramatically more young people are now being infected (they are not), or that France has hit a new high in infections. One needs to focus on infections, not cases. When focused on infections, it is clear that (a) the US is fast approaching a new infection low and (b) there has been no significant second wave in any highly exposed geography.
  2. Math Fact 2: Many believe herd immunity is irrelevant until ~60% of a population is infected. In fact, once a geography has hit a certain level of infection (often approximated by ~500 COVID deaths per million or more), it reaches a point of saturation and key measures of the disease (deaths, hospitalizations) decline. This effect can be proven to be causal, not correlative. This is a data-driven conclusion, not a theory.
  3. Math Fact 3: Many do not realize that statistically, day schools are poor venues for transmission of COVID-19 (with reasonable protocols followed, including use of masks, transmission is highly unlikely).
  4. Math Fact 4: Most people vastly over-estimate fatality risks (and risk of other negative outcomes). For many, examination of the math of fatality risk is eye opening.

Math Fact 1 — Positive Cases are Not Infections: “Don’t Believe Your Eyes”

When you are looking at data about Coronavirus cases, it is crucial for you to work to understand how to interpret the data and not immediately to jump to a conclusion based on what a graph “looks like” it is telling you (and not to blindly trust summaries others provide).

One chart that pervades US coverage of COVID-19 has been consistently misinterpreted:

Many read this chart and misinterpret that there were more infections in the US in July/August than in March/April and that US infections are still “high.” Their interpretation is incorrect.

The chart above is a chart of positive tests (“cases”), not of infections. The increase in positive tests is driven in part by changes in testing (from testing mostly those at hospitals … to mostly the sick … to symptomatic contacts of the sick … to kids who are “well” wanting to attend programs).

The CDC has conducted dozens of geographic based antibody studies to determine the actual number of infections and how infections relate to cases (positive tests). As anyone would expect, the CDC data reveals that the understatement of positive tests to actual infections has been dropping. The chart below reflects CDC antibody studies by month and geography; the numbers indicate how many-fold positive test data (“cases”) were found to understate actual infections.

In March / April, positive tests (“cases”) understated infections by an estimated ~11–12x. By June, the understatement was down to ~4.5x. Now, the understatement may be as low as ~3.5x.

If we adjust to estimate infections instead of tests, we see the following …

The US is fast approaching a new infection low. Also, far more than 30 million people (not 6–7 million) have been infected.

But even graphs like the one directly above have tended to be misinterpreted. Readers (understandably based on a simplistic read) intuit that there has been a “second wave”. In fact, the second increase is driven by the opening up of five states (Florida, Texas, South Carolina, Georgia, and California) and the continuation of their “first waves”.

A better understanding of the data indicates that there has never been a second wave of COVID-19 in any geography that has experienced significant exposure (~500 deaths per million or greater). There have only been “first waves”.

Many countries successfully stopped the virus early (like in Israel, New Zealand, South Korea, Australia, etc.). When the virus has re-ignited in such low exposure geographies, such increases should be interpreted as a re-ignition of the first wave, not a second wave.

Incorrect beliefs that second waves are occurring in highly exposed geographies are due to one of two failures. Sometimes it is failure to segment by geography. For example, some look at total deaths in Louisiana and see two “waves”. More granular segmentation by Parish (“county”) shows a single first wave predominantly in New Orleans followed by an additional first wave in different geographies after re-openings.

More problematic is the misinterpretation across Europe that infections levels are similar to earlier in the epidemic. For example, France just announced a new “high” number of “cases” (or positive tests). To think that France has more infections now than in March is patently absurd, but some observers are indeed conflating “cases” with infections without adjustment.

Some look at “case” graphs such as the one below from Spain and conclude that infections spiked back to prior levels.

In Spain, cases were far more understated early on than in the US (compare early death per case ratios of the two countries). In March / April Spain’s “cases” numbers likely understated the number of infections by a factor of ~25 (now closer to ~3). The hump on the left may be 8–10 times the size of the hump on the right when converted to infections. Moreover, the infections that are occurring more recently are likely concentrated in geographies that were not as hard hit prior to Spain’s lockdown (i.e., Spain’s “Florida” — confirmed by the concentration of new deaths in Spain in the north-east central region).

But what matters is outcomes. There has not been a substantial increase in hospitalizations or fatalities.

This widespread misinterpretation risks putting Europe and other geographies into a “Casedemic” — a panic caused by a misinterpretation that infections are comparable to prior levels and that underlying risk is therefore as high.

In fact, in highly exposed geographies there are no examples anywhere of a sizable second wave of deaths / hospitalizations regardless if bars/pubs were re-opened (England, Spain), schools were opened (everywhere in Europe), there were few mandated restrictions ever (Sweden), etc. No release of any mandate or restriction has led to renewed substantial growth in any highly exposed geography.

Math Fact 2 — Community Exposure is the Largest Single Driver of Decline

What explains the patterns in places with light policy intervention like Sweden?

Many have been told that the impact of “herd immunity” is immaterial until 60% of a population has received a vaccine or tested positive for antibodies. In fact, prior exposure has been the most influential driver of acceleration or retreat of the disease — with awareness of the disease alone, new infections decline after low levels of population exposure (e.g., 10–15%).

To be clear, this assertion does not suggest that lockdowns do not slow transmission (they clearly do) or that wearing masks does not slow transmission (doing so clearly does). The assertion does not comment on whether restrictions should be increased or be eased. It does not comment on policy. It comments only on math and asserts that: once a saturation point is reached, no further substantial exponential growth will occur.

Appendix 1 to this document will name and define the term: Community Exposure Impact as the tendency for disease transmission to decrease as a greater fraction of the population has been exposed.

It will walk through the intuition of why we should expect community exposure impact at low levels of infection due to people’s differences in behavior and susceptibility. It will:

  • Provide a series of explanations to make the dynamics of community exposure impact more intuitive. Intuition should not be that 1 person infects 2–3 but rather that 1 person has contact with many others, some of whom become infected. Those infected may be less exposed but more susceptible. Once the “dry tinder” is burned (are infected and then recover), it becomes harder for the disease to continue to ignite exponentially.
  • Discuss why traditional modeling missed this effect. We may never before have had such extensive data about a disease as we do for COVID. Conventional wisdom assuming relative similarity in susceptibility may be sufficient for modeling diseases like smallpox or measles (or required for considering vaccination protocols) but is wrong for COVID.

An Appendix 2 will demonstrate that we see community exposure impact at low levels of infection. It will present the evidence that there is consistently a lower ceiling on how far the disease progresses in geographies where there is a broad awareness of the disease prior to accelerated outbreak (unlike in Italy, Wuhan, the US Northeast, much of Europe). This effect can be proven causal.

  • The case of Sweden. Prior to COVID taking hold, Swedes had likely already changed behavior. Prior to many observed cases, the government communicated about risks to the elderly and encouraged both social distancing and work from home. But there was not a lockdown. Indoor restaurants, gyms, and businesses operated, and schooling continued. Initially, this laxer approach led to an exponential rise in deaths. But without change in policy or behavior, the pattern abruptly shifted (as community exposure impact predicts). Despite limited policy intervention, open schools, and few masks, deaths from COVID have all but stopped in Sweden.
  • Every US geography following a similar post-lockdown approach of limited mandates followed the same pattern: these “other Swedens” saw rapid acceleration to rapid decline during limited change in behavior. In these places, undoubtedly knowledge of the virus has changed behavior pre- to post- lockdown. But first hand anecdotal observations from Florida show no change in behavior from the period of acceleration to deceleration. In Florida, employees of juice bars and convenience stores still don’t wear masks and sing-a-long dueling piano bars are still full. Community exposure rather than increased protective measures caused the declines below:
  • The appendix will outline how the US lockdown experience was impacted by community exposure. Many have consistently mis-attributed declines observed to the impact of our actions alone rather than to the impact of our actions combined with community exposure.
  • If the lockdowns alone were responsible for the declines, we would have expected deaths to decline after ~3 weeks of lockdown across all geographies.
  • Instead, despite mobility data and other metrics indicating that similar behavioral changes occurred after lockdowns across many areas (e.g., across NY, MA, IL), results differed as varying exposure predicts
  • New York declined after 3 weeks [roughly what community exposure predicts],
  • Massachusetts declined after 5 weeks [only after infections levels had risen to points where community exposure impact would be expected]
  • Illinois saw declines only after 7 ½ weeks into lockdown [again, as predicted]
  • The growth rate slowed everywhere soon after lockdowns were put in place (i.e., the lockdowns reduced transmissions), but the growth continued in all geographies until the point when sufficient infections were present for the transmission declines from lockdowns to be further aided by transmission declines due to community exposure.
  • The appendix will note that the experiences in South America and elsewhere mirror the same pattern and will highlight that no country with a population over 50,000 has experienced more than 1 in 1,000 people dying. If community exposure impact were not the explanation, surely one country somewhere would have maintained the 1–2 weeks of additional exponential growth sufficient enough to have had a different mortality experience.
  • The appendix will explain why New York, NJ, MA, CT experienced outlier death and infection levels due to being “largely unaware” of the disease early. It will also explain why in geographies with limited exposure (like South Korea, New Zealand, Australia, Israel) additional outbreaks may continue. Math can show that if the US northeast had locked down just one week earlier, fewer deaths attributed to COVID would have occurred. Conversely, locking down earlier in Texas and Florida would have had limited impact.
  • Finally, the appendix will highlight the mathematical implausibility of a second wave absent more numerous observations of re-infection of individuals who previously suffered from the disease.

The disease consistently begins a decline once an area is significantly exposed. Any machine learning method, AI algorithm, or basic analysis of the pattern of deaths across countries and states will yield the same result: the largest coefficient, the biggest predictor, the conclusion with the greatest confidence is that once a geography has hit a certain level of infection (approximated by ~500 COVID deaths per million or more), all key measures of the disease (deaths, hospitalizations) start to decline. In lay terms, the disease always burns out.

The worst is over in the US. The vast majority of major US population centers have hit the threshold above. Therefore, US infections and deaths will continue to decline, regardless of policy. In addition, in highly exposed geographies, a substantial second wave is not mathematically plausible absent widespread reinfections. In such areas, significant increases in deaths, hospitalizations, etc. will not occur unless those that recovered are able to spread the disease again. This could happen. Indeed, the disease may be seasonal. But re-infection is required to create a return to an epidemic state. And warning signs (i.e., many re-infections somewhere in the world) will be seen in advance.

Math Fact 3 — Schools are poor venues for transmission — “Possible” Can Mean Extraordinarily Unlikely

An Appendix 3 to this document will walk through why day school-based transmission is improbable. It will cite:

  • Data from COVID-19 positive individuals entering schools: Despite being small in sample size, the data is statistically powerful in proving that a school environment poses far less risk than an adult work environment in terms of transmission likelihood.
  • The absence of transmission examples: The dearth globally of examples of day school transmission and outbreak is itself statistically powerful.
  • Related incidence and medical data about children: Available data generally points in the direction of children being less likely than adults to transmit the disease.
  • The clarity around when exceptions have occurred and why: In the few exceptions where school transmissions occurred, drivers were adult-to-adult transmission, symptomatic individuals allowed to enter the environment, and protective protocols not being observed.
  • The protocols a school can design and implement: A school can design protocols that not only protect against all three failures above, it can design protocols for which transmission would be highly unlikely even if all individuals were adults.

Math Fact 4 — Most people vastly over-estimate fatality risk

Appendix 4 will walk through why (and the degree to which) individuals are over estimating risk of fatality (or bad outcome). There is a lack of understanding that age is a far larger driver of risk than medical pre-conditions. As a group, the workforce (those under 65 including the portion with pre-conditions) has a lower fatality risk than most perceive.

Consider a thought exercise: assume all members of faculty and staff / administration across an estimated two average sized schools [~600 students] were to become infected (a preposterous scenario, particularly given an expectation of no school-based transmissions). Based on estimates available from the CDC website, in such an event, all those infections of every faculty and staff member of both schools would be expected to result in zero fatalities.

Many in the workforce or working in schools will be incredulous at the statement above. Some readers may immediately pause to turn to one of their favorite tracking websites and divide the number of deaths (at this writing about 180,000 deaths) by the number of confirmed cases (at this writing about 6 million). The calculation leads you to a figure of 3%. That figure shapes and influences intuition for many but does not remotely resemble the average risk of infection for those in the workforce.

There are four major disconnects in most people’s intuition, all quite important:

1. “Cases” vastly understate infections. There have been ~30 million US infections or more.

2. The “average” risk is shrinking as infection fatality risk has decreased over time.

3. Most in the workforce have not internalized just how concentrated fatality risk is in those not working (>80% of deaths are over 65). Risk for those in the workforce is far more remote.

4. Many incorrectly intuit that there is a large sub-group in their organization (including colleagues or themselves) that face notably higher risk due to medical conditions. The statistical increase in risk from an “average” diagnosed medical condition is less than many intuit.

While every individual is certainly different and every medical pre-condition is different, and no medical advice is being offered or implied by these statistics in any way, what the charts above demonstrate is that factors such as diagnosed medical condition (on average) are dwarfed in importance by the impact of age in predicting mortality.

The blunt guidance of “those over 65 or with certain medical conditions are at ‘high risk’” is overly simplistic. Within organizations, many individuals who would designate as ‘high risk’ based on the rule may be members of class at statistically at lower risk than an average person (or than older colleagues not identifying as “at risk”).

Why are these four points not more part of the debate?

Other thinkers with impressive credentials have voiced similar views to those in this document, but to date, such views have not been as central in the discussion about COVID as they should be. The implications of the math outlined in this document are significant for informing actions going forward and should be debated far more actively than is happening today.

Dive into the data and math by clicking the links below …

Appendix 1: Math based proof that the disease declines at levels of infection related to ~500 deaths per million and that the effect is causal (the details)

Appendix 2: The theory behind why community exposure impact happens so early.

Appendix 3: Evidence regarding the unlikelihood of school-based transmission from students to teachers

Appendix 4: More details and math behind estimates of fatality risk

Appendix 5: Comparing COVID to other pandemics and other causes of death

(for my bio, click).

Unlisted

--

--