How To Count Coronavirus Deaths — Why Einstein Was Right.

An In-Depth Analysis Of The Coronavirus Death Statistics From England And Wales

Applied Data Science
16 min readApr 27, 2020

--

Many of the things you can count, don’t count. Many of the things you can’t count, really count — Albert Einstein

I started writing this article in the first few weeks after coronavirus broke into the UK. The newspapers had started reporting on the growth of the virus across the country and log scales were all the rage. We had just gone into lockdown to prevent further people from catching and dying from the virus.

Every day, the number of daily deaths of patients who have tested positive for coronavirus in hospitals is released at 2pm to the public. It is the core metric that is used to measure the relative success of our strategy and benchmark ourselves against other countries.

In this post, we’ll explore whether the coronavirus deaths metric in isolation is enough to explain the true impact of the epidemic. If not, what should we count instead?

Note that this article is not a forecast of the epidemic over the long-term, nor an opinion piece on any particular strategy. The aim is to give you all the information you need to form your own opinions. We’ll work from first principles, building up different ideas as we go.

An example

There are tragically approximately 1,800 road deaths every year in the UK. If a popular new car is released onto the market, we wouldn’t be surprised if deaths involving this car began to rise. It didn’t exist previously and now it does, so it makes sense that it should be the ‘cause’ of potentially hundreds of deaths.

The overall number of road deaths would remain roughly in line with previous years, give or take the overall trend, because deaths in the new car are replacing deaths in other cars that are now not on the road as much.

When analysing the impact of new causes of death, we must therefore bear the following principle in mind.

A rise in deaths mentioning one particular cause is not, on its own, reason to be alarmed.

This begs the question — what should we be counting instead?

Excess deaths

We would have reason to worry if the introduction of a new car was causing excess deaths on our roads — perhaps because it is poorly designed from a safety perspective.

In this case, we would see the rise in deaths related to the new car outstripping the reduction in deaths from other cars.

If the car is as safe as other cars on the road, this shouldn’t happen. If it does, we know there is something something seriously wrong with the new car.

We can approach our analysis of the coronavirus epidemic in exactly the same way. We need prove to what extent the virus is causing excess deaths and to do that, we need to see if the appearance of the disease coincides with an overall increase in the total number of deaths, from all causes.

Deaths from all causes registered (reg) in 2020 (red) and deaths with COVID-19 (coronavirus) mentioned on the death certificate registered in 2020 (salmon) compared with deaths registered in the previous 10 years (grey). Data until the week ending 10th April 2020 sourced from the Office for National Statistics. Chart sourced from https://adsp.ai/demos/coronavirus-england/

The chart above demonstrates that we undoubtedly have a spike in overall deaths from all causes in England and Wales, which correlates strongly with the rise in deaths with coronavirus mentioned on the death certificate. This much is undeniable.

This is solid ground on which to start a proper analysis of the coronavirus death statistics. The ingredient that puts everything into context is the overall number of deaths, from all causes.

Five big questions

In the UK, the Office for National Statistics (ONS) provides overall deaths data and coronavirus deaths data on a weekly basis. Armed with this information, we can start a detailed analysis of the coronavirus deaths data from England and Wales — we’ll tackle it from five different angles:

1. Who is coronavirus killing?

2. What’s the link between total deaths and coronavirus deaths?

3. Are we at the peak yet?

4. How unusual is the death spike?

5. What happens next?

Throughout this analysis we will say, ‘coronavirus death’ to mean a death where COVID-19 was mentioned on the death certificate. It may have been the primary cause of death, a contributing factor or not a factor at all. What’s reported is the number of people that tested positive for the disease at time of death.

Who is coronavirus killing?

The ONS data breaks the data into two dimensions that we can analyse in more detail:

Gender and age group (e.g. Male 45–64)

Region (e.g London, West Midlands etc.)

In particular, the natural question to ask is: what does the death spike look like for each group independently?

Well, let’s take a look…(note that the y-axis scales are different for each subplot for clarity).

Deaths from all causes registered in 2020 (red) and deaths with COVID-19 (coronavirus) mentioned on the death certificate registered in 2020 (salmon) compared with deaths registered in the previous 10 years (grey), for different demographic groups. Data until the week ending 10th April 2020 sourced from the Office for National Statistics.

There are a few major points to note here:

The overall number of deaths of people aged 44 and younger has not been significantly affected by the coronavirus epidemic.

For age groups above 45 years old, the latest number of overall weekly deaths has surpassed the highest recorded in the last 10 years, except for females over 85, where the record is still week 2 of 2015.

It is widely acknowledged that coronavirus is a disease that is particularly deadly for the elderly. Conversely, the data indicates that there are approximately 37 million people under 45 in the UK (56% of the population), who are at no more risk of dying now that they were before the virus hit.

For age groups above 45 years old, the overall death spike for males is more pronounced than that for females.

Coronavirus is a virus that has been measurably shown to affect men more than women and this is backed up by the ONS data.

Every region except the South West, Wales and Yorkshire & The Humber has recorded a new weekly high for deaths registered in a single week

London has been the worst hit region by a considerable margin, with the 2,832 deaths registered in week 15 of 2020 eclipsing the previous high of 1,549 registered in week 2 of 2015

Even though London is the region of England with the youngest population on average, it has already seen an enormous deviation from the normal number of weekly deaths. Recent research suggests that it is not necessarily the urban density that is causing the greater impact of the virus, but instead the fact that it is a travel hub that attracts many thousands of visitors from all over the world and therefore the virus is more likely to have got in early undetected and had time to spread.

Notice how, particularly for London, the shape of the coronavirus spike is very similar to the spike in total deaths. The strength of this relationship is what we shall explore next.

What’s the link between total deaths and coronavirus deaths?

By looking at the demographic breakdowns, it is clear that there is correlation between the increase in coronavirus deaths and overall deaths.

However, as we saw with the car example in the first section of this article, it was not guaranteed that this would be the case. If coronavirus only affected patients that would have died from a different cause in the week anyway, then we wouldn’t have seen an overall increase in deaths.

This idea is worthy of further exploration. Specifically, how closely are the following two metrics connected:

The change in weekly coronavirus deaths (e.g. 100 more than last week)

The change in weekly deaths from all causes (e.g. 150 more than last week)

We can answer this question by drawing a scatter plot of these two metrics for a given pair of weeks, where each point is a demographic grouping (e.g. females aged 75–84) or place of death (e.g. hospital, care home, home). Below, we plot the change from weeks 13 to 14.

The change in weekly death registrations for deaths where COVID-19 (coronavirus) was mentioned on the death certificate against the change in weekly deaths from all causes, for different groups from week 13 to week 14, 2020. Log-scale on both axes. Data sourced from the ONS.

Firstly, let’s talk about the diagonal reference line. If a point lies on this line, that means the increase in coronavirus deaths perfectly matches the increase in total deaths. So if 200 more coronavirus deaths were registered this week than last week, then the total deaths figure would also be 200 larger than last week.

Curiously, all points lie above the reference line. This means that for all groupings, the total number of weekly deaths increased by more than the number of coronavirus registered deaths. How can this be?

There are two possible explanations. One idea put forward is that the lockdown has caused more people to die of causes other than coronavirus, because they are delaying seeking help at hospital for fear of catching the virus. This is supported in part by the Emergency Department Syndromic Surveillance System (EDSSS), which tracks the number of people going to emergency departments each day across a network of hospitals in England.

Source: Emergency Department Syndromic Surveillance System Bulletin April 22nd

The second explanation is that we may have undercounted the number of coronavirus deaths. That is, people have died where a coronavirus diagnosis has been missed and therefore it is not on the death certificate. This is supported by the fact that in hospital, where a coronavirus diagnosis is more likely to take place, overall deaths increased approximately by the same amount as coronavirus deaths from week 13 to week 14, but in homes and care homes, where a coronavirus diagnosis is more likely to missed, the overall death increase is a lot higher than would be expected given the number of reported coronavirus deaths.

In fact, it has been calculated in several studies and articles that care home deaths may account for 50% of the total number of deaths caused by the virus.

If we now look at the same chart but showing the change from week 14 to week 15, we see a different story.

The change in weekly death registrations for deaths where COVID-19 (coronavirus) was mentioned on the death certificate against the change in weekly deaths from all causes, for different groups from week 14 to week 15, 2020. Log-scale on both axes. The dotted lines represent the change in position from the previous week (weeks 13 to 14, 2020). Data sourced from the ONS.

Home and care home deaths remain above the line, which indicates a continued underreporting of coronavirus deaths in these locations. However, most points, have moved directly downwards — that is, the increase in overall deaths has decelerated, whilst the increase in coronavirus deaths has decelerated by a smaller amount or remained roughly the same.

This is odd, because if there is a linear relationship between changes in coronavirus deaths and overall deaths as we would expect, then points should move directly towards or away from the origin (the point 0,0). Where this isn’t the case, we can tentatively conclude that the start or end point for the line traced by the point may be in the wrong place, because it’s been measured wrongly.

We can trust that the overall death count is accurate and that a positive coronavirus test from a deceased patient is unlikely to be wrong. Therefore we only have one degree of freedom with which to move points to fit the expected direction of movement towards the origin — a large underreporting of deaths where coronavirus was present at time of death, so that points are shifted to the right on the week 13–14 chart.

Note that if this is true, it does not mean that more people have died than have been reported. Nor does it necessarily mean that more people have died due to coronavirus than have been reported . It only means that the presence of coronavirus in the deceased may be significantly greater that we have measured so far — particularly for deaths within the home and care homes, but also within hospitals too.

Understanding prevalence of the disease in society is critical — mass testing of the population is currently being started in Germany and results from this and other reports will help us to estimate how the virus will react once lockdowns begin to lift.

The fact that we are living through the evolution of the epidemic in real-time, with new data being generated on weekly basis makes detailed analysis of this kind difficult and reliant on assumptions. However, by continuing to pick apart the data in this way, we will gain ever deeper insight into the true nature of the virus and where we are currently failing to identify the true tally of its victims.

Are we at the peak yet?

So far, we’re looked backwards at what has happened so far. Now let’s look forwards at what might happen next in the immediate short-term.

From week 14 (week ending April 3rd) to week 15 (week ending April 10th), the reported number of coronavirus death registrations in England and Wales nearly doubled, from 3,475 to 6,213. Will it double again next week? Or are we already at the peak?

To answer this question, we can use the daily death statistics released by NHS England, which show that the overall peak by date of death appears to be between 6th — 10th April, for hospitals in England. This may change as new data is announced over the coming days.

5-day moving average of daily deaths from hospitals in England where the patient tested positive for COVID-19 (coronavirus). Data sourced from NHS England.

It is worth noting that NHS England only counts deaths in hospitals in England, whilst the the ONS counts all deaths in England and Wales, including those in the community, so by drawing conclusions from this chart we are assuming that the shape of the epidemic in hospitals is similar to that in the country as a whole.

However, this may not be the case —particularly in care homes, where the only data currently available is that provided by the ONS, at a weekly cadence and lagged because the deaths take time to register and the data take time to process.

UPDATE: New data released on 28th April shows that care home deaths do indeed lag behind hospital deaths. Death registrations from all causes in care homes increased 48% week on week, compared to 10% in hospitals (week ending April 10th — April 17th).

How unusual is the death spike?

We’ll finish by turning our full attention to the overall spike in deaths from all causes and attempt to answer the following question— how unusual is the increase, both in terms of the average number of deaths at this time of year and historical spikes in deaths?

Certainly, the short-term increase in comparison to the usual number of deaths registered at this time of the year is striking. The average number of deaths registered in week 15 over the last 5 years is 10,520. This year, 18,516 deaths were registered — an excess of 7,996.

If we calculate these excess deaths for every week of the year for the last 10 years, we get the following table (excess deaths is defined as the total deaths in the week minus the average deaths in the same week over the previous 5 years):

Excess deaths for each week of the last 10 years in England and Wales. Data sourced from the ONS. Red = more deaths than average, green = less deaths than average.

What’s clear is that weekly death figures are autoregressive — there tends to be long periods of ‘red’ (more deaths than average) followed by long periods of green (less deaths than average). This is particularly evident in the winter weeks where seasonal flu can either be mild (such as in 2014 and 2019), so we have a long ‘green’ patch, or harsh (such as in 2015 or 2018) so we see several continuous weeks of ‘red’.

In fact, the start of 2020 has been particularly mild — the first 12 weeks of 2020 have seen 4,822 fewer people die compared to the 5 year average.

This lack of recent deaths is compounded by the fact that 2019 only saw 1,984 excess deaths across the year, compared with 35,741 in 2015, 21,699 in 2016, 22,308 in 2017 and 21,189 in 2018.

We should expect some increase in the number of deaths over time as the population is growing by approximate 0.6% every year and the population is ageing. A metric called the Age-standardised mortality rate (ASMR) is used to account for this — it calculates the mortality rate (deaths per 100,000 population) given a standardised population across years.

The following chart shows the ASMR for England and Wales from 2001 to 2018:

Source: ONS

We can see that the ASMR for 2016–2018 remains roughly constant — so the 21,000–22,000 excess deaths in these years is approximately what we should expect, after taking into account the population increase and ageing.

This means that there were approximately 20,000 fewer excess deaths in 2019 than would be expected given a constant ASMR. Combined with the additional 4,822 from the start of 2020, this gives nearly 25,000 fewer deaths over the last 64 weeks than might have been expected.

We should not use this recent reduction in deaths as an excuse or justification for the short-term impact of coronavirus on the overall death toll. Instead, what is clear is that when passing judgement on the significance of the spike, the timescale over which you perform your analysis matters greatly.

As we are in the middle of the crisis, short-term analysis is rightly the focus of attention — the immediate increase in total deaths highlights the very real additional stress placed on the NHS and care homes. However, when we zoom out and look back on this period of history in years to come, what will the 2020 overall death toll and ASMR look like?

It all depends on what happens next.

What happens next?

In this section, we’ll imagine three different futures for how the rest of 2020 could play out. For each, we’ll analyse the key patterns in the data that define the scenario, the conditions that need to be met for the scenario to come true and the overall potential outcomes. At this stage it is almost impossible to forecast which scenario will be closest to the truth, so we will treat all three with equal scrutiny.

Scenario 1: Multiple Peaks

This scenario is the most feared. The premise is simple — we don’t yet have a vaccine and the population as a whole hasn’t built up immunity. So once lockdown is lifted, the virus just comes back — again and again and again, until we develop a vaccine.

Key data patterns to look for

  • Many spikes in overall deaths throughout the year that correlate strongly with the spikes in coronavirus deaths.
  • There may still be a dampening effect throughout the year as lockdowns are used earlier to control the virus.

Conditions under which this scenario is more likely

  • A vaccine has not yet been developed
  • People do not gain immunity to the virus or at least, immunity is very short-term
  • Not enough of the population has caught the virus for herd immunity to take effect
  • Locking down the population is the only way to suppress the virus

Outcomes

The statistics will show an abnormally large increase in total deaths across the year — there will be a continued global effort to suppresses the virus until a vaccine is found.

2020 will be viewed as the year where lockdowns are the norm and the population lives in fear of the virus — especially the most vulnerable in society.

Scenario 2: Normality Resumes, At A Cost

Once the current first peak has subsided, what if the second peak never happens, even after lockdown is lifted?

Key data patterns to look for

  • Lockdown is lifted and the number of overall deaths does not begin to rise again.
  • The number of total deaths each week returns to a level that is comparable to previous years

Conditions under which this scenario is more likely

  • A far greater proportion of the population has caught the virus than is currently estimated and are immune in the medium-term. Therefore the virus finds it more difficult to spread.
  • Full lockdown is not necessary to control the virus — it can be controlled through increased social distancing, without the need for a full shutdown of the economy, as per the Swedish approach.
  • The majority of victims of the virus would not have died in the year from other causes so death rate for the remainder of the year is in line with the average.

Outcomes

The statistics will still show a large increase in total deaths across the year.

We will look back on the virus a severe, but ultimately short-lived disease that took many lives before their time. However, we will question the necessity of the lockdown, given that other countries that have not taken this course have ended up in similar positions, without the economic damage inevitably caused by the shutdown.

Scenario 3: Rebound

Once the current first peak has subsided, we see comparatively fewer deaths for the rest of the year, so that the overall yearly death toll is on a similar scale to previous years.

Key data patterns to look for

  • Lockdown is lifted and the number of overall deaths does not begin to rise again.
  • Fewer deaths on average across the remainder of the year

Conditions under which this scenario is more likely

  • The same as scenario 2, except that excess deaths are not as pronounced when viewed at a yearly level.

Outcomes

The statistics will show that the ASMR for the year is not drastically different from previous years.

Questions will be asked about the scale of the response, given the overall small movement in ASMR. This scenario will pose ethical questions about how we value life and sociological questions on the optimal response to the spread of a life-threatening communicable disease.

Other scenarios

Of course, the true reality may end up being some combination of all three of the above, or something else entirely. The only thing that is certain is that nobody knows what will happen for sure. We can only hope that our world leaders are brave enough to adapt to new information as it comes in and take the right decisions at the right time based on the right information.

Summary

In this article, we have explored the many ways that coronavirus death statistics can be interpreted and analysed. Most importantly, we have seen that the overall death toll is a crucial ingredient for contextual interpretation. Analysing coronavirus death statistics in isolation is never enough.

We started this blog with a quote from Albert Einstein about the importance of counting the right things. The pandemic death toll is sadly a perfect case in point. We can’t count care home coronavirus deaths very well, but they really matter.

Whatever conclusions you draw from this article or your own analysis of the data, be sure to keep an open mind to all possibilities. The global lockdown has taken us into uncharted territory and it is therefore more important than ever that where possible, we validate our beliefs by doing our own counting rather than relying on others to do it for us.

I’ve always said to myself that if a little pocket calculator can do it why shouldn’t I? — Roald Dahl, Matilda

Stay safe, and thanks for reading.

Applied Data Science Partners is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website. Follow us on LinkedIn for more AI and data science stories!

--

--

Applied Data Science

Author of the Generative Deep Learning book :: Founding Partner of Applied Data Science Partners