Covid-19 — Estimating Incubation Period, Infectiousness, and R From Deaths

Steve Jones
6 min readJun 22, 2020

--

This article has been superseded. You shoud read my latest article instead.

Based on analysis of the SARS-CoV-2 NHS hospital deaths data for England, my conclusion is that:

  • In England, the virus started to spread exponentially some time in the last two weeks of January.
  • The R number was initially 4.5 on weekdays, and 3.9 at weekends, but began to fall in the last week of February, dropping to less than one by 16th March.
  • An infected person is infectious for two days (on days five and six) and being twice as infectious on the second day as the first. Alternatively, individuals are infectious for one day with twice as many being infectious on day six as on day five.

This 1:2 pattern combined with the differing values of R (caused by weekdays vs weekend, changing social behaviour, and possibly a degree of herd immunity) produces the distinctive pattern of infections seen for covid19.

The Process

Time between infection and death varies, so the data is “sharpened” to provide a better estimate of infections. This results in:

Orange = Actual recorded deaths by day from NHS data. Yellow = Sharpened version (more accurately representing infections that will result in death)

I used excel to define a constrained set of R numbers (where blocks of adjacent R numbers are linked) but with weekdays and weekends independent from each other.

From this data plot the expected number of infections per day, varying the parameters to obtain the optimum match against the sharpened NHS data. This results in:

  • An approximate set of daily R numbers
  • Seed number (number to start the spread)
  • Distribution of infectiousness (following infection when is the virus is infectious)
  • Distribution of time between infection and death
Blue line = Calculated number of deaths per day (based on initial estimates optimised to match expected infections)

For the UK this resulted in the following chart for R over time:

Blue = weekdays. Red = weekends and holidays

And an estimate for Infectiousness:

And an estimated time to death (low standard deviation because data was sharpened to better match pattern of infections):

The seed number (number of people infected who would later die was calculated as 8.02176E-09 for each day between 27th November and 3rd December 2019.

Finally, relax the constraints on R (allowing R to vary between adjacent days).

Iterate to find optimum set of R numbers (where R remains reasonably constant between similar adjacent days) and where the calculated expected number of infections matches the expected pattern.

Results

Infectiousness:

Infectiousness (following infection on day zero).

So for one hundred people catching the virus on Sunday, when R = 6, this group will infect an additional 200 people on the following Friday, and another 400 on the Saturday).

Calculated R Number:

The seed number (people initially infected) dropped to 0.000000005.

Assuming that between 0.5% and 2% of all people infected will die, the start date for the spread of infection will be when the number of people infected rises to somewhere in the region 0.005 to 0.02 (which for England happened some time around January 24th).

Expected time from infection to death:

…with even lower standard deviation due to the reduction in standard deviation for infectiousness.

The calculated pattern of infections that will result in death is:

Yellow = Sharpened NHS data. Blue = Calculated value

A near perfect match. There is a discrepancy when the numbers first start to rise. I’m not sure why that is yet, and it’s getting too late to think, but I’m guessing it’s because my process uses real numbers (fractions) whereas real world infected only come in whole numbers.

The Important Bit

Infectiousness:

Note days 5, 6 and 7

To use this data to produce a graph of infections, plot a series of points (days) where each one has a designated R number and [number of people infected]. R for each point, can be any positive number, as can the [number of people infected] for the first point/day. Subsequent values for [number of people infected] are calculated as the sum of ( 0.3333 x [R five days ago] x [Number of people infected five days ago] ) and ( 0.6666 x [R six days ago] x [Number of people infected six days ago] ). By choosing varying R values close to 5 for the first set of points, and number close to 0.5 for the rest, you should see a graph very much like the covid19 charts.

(Note: This is just sample data to see how the

If correct, then this information should be extremely useful for understanding and controlling the spread of the virus. There are plenty of practical applications.

If incorrect, please point out my mistakes.

Supplementary Information Based on Feedback I’ve Received So Far

There are three sets of variables:

  • R number by day.
  • Infectiousness (mean standard deviation, and skew).
  • Period between infection and death (also mean standard deviation and skew).
  • Seed isn’t really a variable, in that any value over zero will work, and results in pretty much the same set of results (except maybe for the estimated R number for the first few days).

It has been pointed out that there are a large number of variables and enough to draw an elephant, but unless you use a very short infection period, I challenge you to draw a hedgehog.

I tried a variety techniques and settings to match the output to the NHS data, and the method I settled on was:

Sub_score_1 = Sum ( ( Calculated_number_of_infections — Expected_number_of_infections ) ^ 2 ) x ( (200 — day number) ^ 2 )

And to keep the R numbers between similar days aligned I used:

Sub_score_2 = Difference between adjacent R numbers raised to the power 4.2

With overall score being comprised about 70% from Sub_score_1 and 30% from Sub_score_2

To find the best fit, the process iterates, changing variables in pairs, iterating until the further changes make no obvious difference.

It is possible to generate matching graphs with very different sets of numbers. However, these sets of numbers would be relatively small (e.g. differences between adjacent sets of R numbers), or look inconsistent (e.g. a similar solution with time between infection and death exactly seven days longer, also works). I’m also pretty sure the multiple solutions would all share a very low standard deviation for infectiousness.

For “sharpening”, I calculate the difference between a weighted 5 days average multiplied by 4. This brought the standard deviation for Infection to Death (sharpened) down to under 1. It did put a floor of zero on the result, so I suppose the number of infections that lead to death that I calculate will be very slightly high.

The starting distribution for time to death was artificially reduced. But this was for stage 1 to get an approximate starting point. The whole of stage 1 can be skipped, but it just takes a lot longer to run when R starts at 1 for all days.

The numbers generated are for people infected who then die. If mortality was 100% then this would also reflect total number of infections. I don’t try to predict mortality rate, except to estimate when the spread started (late January if mortality rate is between 0.5% and 2%).

I haven’t made any attempt to differentiate the effect on R between herd immunity and changes in public behavior.

The method I used is far from perfect. I’ll admit, I bodged this together over a weekend, and it isn’t pretty. But I do see how a ridiculously short infection period can generate a set of realistic looking results that match the pattern of actual recorded deaths for covid19. I can’t think of any other way me, or nature could do that.

--

--