Caspar Verhey
8 min readMar 13, 2020

--

While the general conclusion and advice are pretty sound (“there are more people infected than we know”, “act early” — duh), this blogpost is making illogical calculations, and draws conclusions that are not even supported by the results. All in all it reeks of sensationalist cherry-picking and framing of data and results.

I’ve doubted between welcoming the author into the research community, or just being my sarcastic self. Since I don’t get paid for this, while the author seems to be a pro in the field of marketing and branding himself, I went for the latter, though I tried to keep it decent and friendly.

Here’s some peer-review:

Country Growth

Up until figure 5 I’m okay with it, just some raw data without data mangling, I don’t see anything weird.

Then comes fig 5 which analyses the growth rate per country, based on just 2 consecutive days. Of course we then get some huge outliers. And since it only shows countries with a growth rate > 5%, we only get to see the high outliers (bias alert!). Fig 6 extrapolates that for a week, leading to sevenfold exponential magnification of the outliers. Oops.

Here’s the same chart as fig. 5, same data source, except with the growth averaged over 7 days — much more reliable than a single measurement. Check the big differences it makes — in both directions. Just like the original author, I included the countries with >20 cases on March 5. I didn’t exclude countries with a <5% growth rate, since there were only a few.

Fig. 1: Average confirmed cases growth rate between March 3 and March 9. Countries with >20 cases on March 5 only.

China

Next there’s this huge bit on the number of true cases for any day x. What the author apparently means is true date of infection, because it’s still only about patients who later got identified as case according to the WHO definition. Most people without symptoms or with just very mild symptoms still won’t get identified as a true case.

He just calculates TrueCases(day x) = ConfirmedCases(day x+7) (or somewhere around 7). However he then keeps framing it as if the GoVerNMeNts ArE LyINg about the number of cases — while it’s just a completely different definition of what he calls ‘case’. It’s no secret that the number of cases according to the WHO-definition is an incubation period behind the number of infected people.

Eastern Countries

The explanation for lower growth rate in “Japan, Taiwan, Singapore, Thailand or Hong Kong” is experience with SARS-1. According to that argument by itself, it follows that at least the USA, Italy, Germany and especially China should have been prepared too. The full list of countries that had more SARS-1 cases than Japan: China, Canada, Vietnam, United States, Philippines, Germany, Mongolia, France, Australia, Malaysia, Sweden, United Kingdom, Italy, India, South Korea, Indonesia, South Africa, Kuwait, Ireland, Macau, New Zealand, Romania, Russia, Spain, Switzerland.

“Patient 31 was a super-spreader who passed it to thousands of other people.”

This patient had 1160 contacts, not thousands —at least that’s what it says in the provided source.

Washington State

“But something interesting happened early on. The death rate was through the roof. At some point, the state had 3 cases and one death.”

Duh, with only 3 cases and without looking at a CI, it’s pretty easy to show a super high death rate. Given enough cities in the US with just a few infections, you’re bound to get some with a death rate like this (Google: multiple testing problem, selection bias). If an active reader would bother to calculate a 95% confidence interval for 1 death on 3 cases, they’d find it’s between 0.8% and 90.6%. In other words: completely meaningless and worthless to report.

Modeling true cases pt. 1

The number of true cases are “modelled” as follows (mostly paraphrased):

If we have one death today, “We know approximately how long it takes for that person to go from catching the virus to dying on average (17.3 days).” With a mortality rate of 1% (assumption) that means there were 100 people infected at that moment. Given the number of infection doubles in 6.2 days, 100 infections @ 17.3 days ago = 100 * 2^(17.3/6.2) ≈ 800.

First of all this doesn’t take into account that the initial patients per region were infected in another location. One death in Washington doesn’t mean that there are another 799 infected people walking around right over there — in most of the early cases most of these 799 people were probably still walking around in Hubei or Italy. Similarly, some patients might have come to the USA without dying. So there might be 800 patients in the USA, if that’s the scope of where you looked for single deaths.

Second: 100 * 2^(17.3/6.2) = 692, not 800. I don’t know what went wrong there. (send me €€€ to help me buy a new calculator /s)

Third: 17.3 days seems to be the time between symptom onset and death, not between exposure and death (unweighted average from multiple studies). You’d have to add the incubation period ≈ 5.8 days to that.

Fourth: the provided sources list multiple infection doubling times, there’s no reason given for picking this one, and I can only guess (this is the fastest one).

Last, the definition of “case” is super important too. If I talk about true cases, I would mean every person who is infected. In that definition, the mortality rate is super low, and the disease severity for most patients is super low too. However confirmed cases are everyone sick enough to seek medical treatment and subsequently diagnosed, or lucky enough to get found through screening of asymptomatic people. Confirmed cases are a population with a much higher mortality rate and more severe outcomes.

The studies that were used to calculate the 6.2 day doubling time use different definitions of ‘case’; some go for lab confirmed cases only while others go for suspected cases. It’s important not to frame a population with many asymptomatic cases as if they have the same growth rate and the same risk of adverse outcomes — because they don’t.

San Francisco Bay Area

or: Modeling true cases pt. 2

The next model is even less solid. Kudos that it’s stated explicitly in the spreadsheet, but it should have been in the blogpost too.

“For the Bay Area, they were testing everybody who had traveled or was in contact with a traveler, which means that they knew most of the travel-related cases, but none of the community spread cases.”

That just doesn’t follow; if they tested people who were in contact with travellers, those people would be the potential community spread cases.

“By having a sense of community spread vs. travel spread, you can know how many true cases there are. I looked at that ratio for South Korea, which has great data. By the time they had 86 cases, the % of them from community spread was 86%.”

That’s fun, applying the ratio derived in South Korea to the Bay Area — as if it’s a constant. It’s not going to be the same ratio for every city, and even for a given city, it’s not going to be a constant over time.

Additionally it assumes that all cases are identified only because they were found during screening of travellers, ignoring that symptomatic patients would also get tested.

France and Paris

or: Applying the models

“France claims 1,400 cases today and 30 deaths. Using the two methods above, you can have a range of cases: between 24,000 and 140,000. The true number of coronavirus cases in France today is likely to be between 24,000 and 140,000.”

Oh wait — I thought we should first validate a model? At least could we just apply them to some previous data from a region in China and see how it performs there before we’ll use it for projections? No? 😔

Anyway: with these 2 models with vastly different outcomes, that’s not really an indication that they’re reliable models or that the true value is going to be somewhere in between them.

I also have no idea how these outcomes were calculated. If I just try what was described before:

First model: 30 * 100 * 2^(17.3/6.2) = 20,754

Second model: 1,400/(1–0.86) = 10,000

That’s not anywhere near the results reported by the author. Maybe I missed a step in the method — I’d like to hear if someone can point it out.

Again this second model assumes all 1400 cases are identified by traveller screening only, and that the ratio travel-spread:community-spread is the same as in South Korea. Which doesn’t fly here, as France’s identified cases were travelers who got identified because they were sick, not because of screening.

Don’t believe me?

Or: Can we create yet another model?

“So when Wuhan thought it had 444 cases, it had 27 times more. If France thinks it has 1,400 cases, it might well have tens of thousands”

Wonderful. Of course Wuhan’s diagnosed cases followed some days behind the number of symptomatic people. The ratio between the two however, is meaningless. It will always start out super high, and go down from there.

If you look just a while earlier: Wuhan thought they it no cases, it had infinite times more 🙀🙀🙀 ZOMG!!

I wanted to illustrate this with the same data that the author showed in figure 11, but with the ratio for multiple dates instead of just one. Unfortunately I couldn’t easily find the raw data from Zunyou et al. used to make it. So instead I graphed all confirmed cases over the same date period in China, but as cumulative cases instead of new cases, and then I plotted that same data 9 days earlier — eyeballing the graph, that seemed to have been somewhere around the delay between symptom onset and diagnosis.

The ratio between total symptomatic people and those diagnosed, for any given date, was as follows:

Fig. 2: Diagnosis ratio. Pointless to report a single one, misleading to then use it as a constant in projections.

Note that this graph is just to illustrate how reporting a single ratio between the two is meaningless by itself. In reality the real time between symptom onset and diagnosis varies between patients and is not a static 9 days (which drastically changes the outcome again, making it even more pointless).

More honest would be to note that the number of new infections and the number of new confirmed cases had an identical but delayed growth rate. Blindly extrapolating to another country, with over 3x the number of known cases, is just sensationalist BS again.

(Sidebar: the author’s fig. 11 caption mentions he did an “analysis”, but he just added two arrows and two text labels. Big boastful energy. And there’s a link to the source article but it doesn’t mention the authors, that’s not respectful to their work.)

Additionally, comparing with Wuhan is just the worst place to compare with, as there were already 41 confirmed cases in Wuhan the moment test-kits became available (and probably a lot more true cases), while in France test-kits were available from the very first patient entering the country.

And in case the original author’s point was just to illustrate that confirmed cases != symptomatic cases, he could have said that with a whole less words and hypothetical projections.

Spain and Madrid

Same feedback applies here.

End of Chapter 1.

There’s 3 more chapters, but there doesn’t seem to be much in this for me. And I don’t want to spend more time on this than the original author.

Before anyone responds, all this criticism doesn’t mean I’d advocate the opposite. Just that this blog doesn’t add anything to what we knew before reading it and it shouldn’t be used blindly. Unfortunately the author made it really easy to do just that, with such a convenient spreadsheet.

--

--