Explorations with Data Science

Virus Super Spreaders: Heterogeneity Turned Deadly

What the Pandemic can Teach us About the Human Mind

Dieter Bingemann

Published in

Spectroscopy & Data Science

7 min readMay 24, 2021

With vaccinations advancing and infection rates finally on a downward trend in many parts of the world, we might be allowed a look back towards the beginning of the pandemic, especially the aspects that instilled fear in us all: Why was the spread of COVID so hard to contain? Why did the infection numbers explode almost uncontrollably in some places — but not in others?

The Struggle with Complex Systems

Humans like to find simple reasons to understand their observation, the essential outcome of the enlightenment period and the basis for the scientific method. Sometimes, though, the underlying mechanisms are rather complex and the outcome is not linear. That is when the human mind starts to struggle. Seeing patterns where there are none, we turn to alternative hypotheses.

While Occam’s razor tells us to pick the simpler of two alternative explanations, it does not state that the best explanation has to be simple. Many natural or social systems are actually intrinsically complex, which can lead to unexpected outcomes.

A Tiny Twist with a Huge Effect for an Unlucky Community

We here explore the initial local evolution of the pandemic with a simple model of independent communities. The main twist in the model is the assumption of a minor tail in the number of new infections per infected person (so called ‘super spreaders’). These might be people that just happen to generate more aerosol particles by the way their airways are formed or who were in a poorly ventilated place while they were infectious.

Just by adding this minor tail to the distribution we generate a very nonlinear response in an otherwise homogeneous simulation. We generate pictures that are very reminiscent of the ‘hot spots’ seen early in the pandemic. While experts struggled to explain why there, why then, why so many, most of us were quick to find the culprits in the affected communities. In the simulation, though, the growth of such a hot spot, as it turns out, is simply pure bad luck.

Infection Spread Simulation Model

We model the spread of a single infection in time in a number of independent communities that do not have any contact with each other. This ignores any spatial spread of a disease, which is essential for the development of a pandemic, and only focuses on the evolution in time. While simplistic, this model still captures the generation of ‘hot spots’ in the simplest possible fashion. To understand complex behavior, in agreement with Occam’s razor, we simplify the model until we only keep the essential bits.

The model simulation is coded in Python and available as an interactive app for exploration on streamlit.io at:

https://share.streamlit.io/dbingema/superspreader/main

The model advances infection the numbers in each community week to week, with each infectious person infecting between zero and some number of other people during this week, while turning non-infectious themselves.

New Infections Caused by a Person

The number of people infected by a single infectious person varies from person to person, a good number of infectious people do not infect anybody, most people only infect one other person, and only a small number of people infect a significant number of others.

Probability for each number of new infections caused by a single infectious person

The average number of infections caused by an infectious person is given by R, a number larger than 1 leading to exponential growth (on average), a number smaller than 1 to a decay of the overall number of infections. For the pictured distribution, and this simulation, this average infection rate, R, is 1.

The number of infections from a single infectious person, as pictured, is simulated with a log-normal distribution:

Log-Normal Equation used to simulate new infections per infectious person

In this equation, the maximum position of the distribution is given by x0, the width of the distribution by the upper case delta (∆) and the asymmetry (the extend of the tail) by the parameter b, which can go from negative values
(tailing left) to positive values (tailing right). An asymmetry of zero
describes a symmetric, Gaussian, peak. The example parameter values used for the distribution pictured above are given above the image.

In Python, using the imported module math for the natural log and the exponential function, this equation could be expressed as:

To find a random (integer) number of infection for a given person, we have to work our way backwards from a uniform random number provided by Python via the inverse cumulative distribution function to its integer arguments:

Here we used the imported modules numpy as npand random.

Each infection period lasts one week and each infected person can infect a certain number of other people in the neighborhood in this week. This number of new infections (for a single infected person) is drawn from the (integer) log normal distribution shown above. After that week the person is no longer infectious. A week is therefore a single simulation period.

With infectionLocations the list of the community for all individual infected people, and infectionsByLocation the number of infections per community, we can propagate the infection evolution from week to week as follows:

Propagation of infection numbers week to week

Importing the pandas as pd module, we here identify the community by its x, y coordinate to simplify the graphical display — a linear list would work equally well.

Infection Simulation

The model simulates the number of infections in 400 independent communities that do not interact over the course of a few weeks. This ‘20x20 grid’ approach only serves to visualize the distribution of results. The simulation assumes that there is no exchange between these communities for example through travel. Once an infection has died out in a community, it will not come back.

While clearly an oversimplification of the real spread of an infection, it captures the main factor we want to explore here: the effect of ‘super spreaders’, the seemingly minor tail in the distribution of the number of new infections per infectious person.

We start with a single infection at each location, each here visualized by a single point, offset randomly ever so slightly for a more pleasing presentation.

Initial Distribution of Infections on a 20x20 grid, each infected person represented by a single dot

Infection Evolution

For each infected person we now randomly pick the number of people they infect in their community for the following week (while turning not infectious themselves) from the (integer) log normal distribution listed above. For week 2, we could then, for example, find the following infection distribution:

Distribution of Infections by Community for Week 2

Already, the infection has disappeared in some communities, while we notice a slight increase in the number of infections for other communities.

If we continue the simulation over 30 cycles, we notice that for many communities the infections dies out, but also that for a few communities very large infection clusters formed, so-called ‘hot spots’. These infections do not ‘transfer’ from one community to another, the simulated communities are not interacting.

Simulation of the infection number evolution in each community over 30 weeks.

This appearance of a few ‘hot spots’ is not very different from the dynamics seen during the initial period of the pandemic. Meanwhile, the average infection spread (averaged across all 400 communities simulated), as measured by the infection rate R, remains very close to 1.0 throughout this entire simulation, and the number of total infections remains about constant.

Who is responsible for the hot spots?

So, did these ‘hot spot’ communities do anything wrong? After all, most other communities fared really well, the infection died out, and only for a few communities do we notice this increase of infection numbers in the simulation. While blaming the community might be the obvious conclusion, it could not be further from the truth.

This simulation is based on 400 identical communities. While propagating a random model, all locations face the exact same infection probability distribution and started off with the exact same initial condition of just one infection per community.

However, given the small probability for super spreaders, the slight tail in the distribution of the number of infected people per infections person, the system turns highly nonlinear. While due to the low numbers of initial infections, there is the potential for a local disappearance of the infection, due to the small chance for highly infectious people there is also the possibility of a rapid increase.

To answer the lead-in question: nobody was responsible for the hot spots. None of the hot spot communities in our simulation did anything wrong. They were just unlucky. A single super spreader caused their explosion of infections. And the super spreaders themselves were also just unlucky, their airways just produce more aerosols, they were at a poorly ventilated place while unknowingly infectious, they were in the wrong spot at the wrong time. The luck of the draw dealt them a large number of infected people.

Linear Thinking in a Nonlinear World

It is human nature to try to look for a reason for bad luck, a culprit to blame. As humans we do not seem to be willing to accept chance. Humans are also excellent at seeing patterns, our brains being trained in this fashion from the day we are born. We are so good at seeing patterns that often we see them where there are none. Five random points have a decent chance to fall close to along a line —despite random, we immediately jump to a conclusion and clearly see a trend.

Our world is complex and linear thinking with simple cause and effect relations will not explain it. While understandable, this tendency to search for reasons in nonlinear systems driven by chance can lead to unjustified blame and equally unjustified self-doubt.

We owe compassion to our affected communities and should stop blaming the victims. Not just in simulations.