Crash Course in Epidemics

Jordan Crosby
Analytics Vidhya
Published in
7 min readApr 21, 2020

How do experts forecast how a virus spreads through society?

You have probably seen many different predictions on how the Coronavirus will play out. From scenarios with millions dead and the collapse of modern society, to everything going back to normal in a few months, predictions seem to be all over the place.

Many of these predictions come out of prestigious universities and Governmental organizations, and the methodology is either kept secret, or expressed in a complicated way.

In reality, these predictions are made from a few simple and intuitive observations. Modeling the spread of a virus is easy, and something that anyone can understand.

The first building block to understanding how a virus spreads is knowing what the Reproductive Number of a virus is.

The Reproductive Number is the average number of individuals that one infected person will infect during the course of the illness.

If the Reproductive Number is 2.0, we could forecast the growth of the virus like this:

This is known as Exponential Growth. The equation for modeling what the number of Infected (I) on each Day (t) is:

Exponential Growth

Day 1: 1 person is infected

Day 5: 32 people are infected

Day 10: 1024 people are infected

Day 20: 1,048,576 are infected

We can use this equation to start building a Model to forecast how a virus would spread.

This escalates very quickly. At Day 30, there are over 1 Billion people infected.

The next building block is the Recovery Time of a virus.

The Recovery Time is the expected amount of days it takes for an individual to recover from the virus.

An outcome of including Recovery Time into our model is that it modifies the Reproductive Number to be spread out over the course of the illness.

Think of it this way: a sick person doesn’t do all of their infections in one afternoon. They are contagious for the duration of the illness and can spread it each day until recovered. If we assume they are destined to spread the virus to 2 people, these infections would be spread out evenly for the length of time it takes to recover.

let’s say virus has a recovery time of 4 days, and a reproductive number of 4.0. Using these properties of a virus, we can find the Daily Reproductive Number.

Under these circumstances, this would be 1 Infections per day.

Similarly, a virus has a Daily Recovery Number.

At the individual level, this is more easily understood as the percentage of healing your body does each day. If the Recovery Time is 4 days, your body recovers 25% each day.

With a Recovery Time included, we would expect the growth of those recovered to be equivalent to the number infected with a delay of the time it takes to recover

Infections and Recoveries

The next component to understanding how the model works is to know how the Daily Reproductive Number and Daily Recovery Number work day to day.

This is more intuitively understood at a macro level where many people are infected. When dealing with many infected people, think of these numbers as percentages.

The Daily Reproductive Number is the % of Infected Individuals that successfully infect one person each day.

The Daily Recovery Number is the % of Infected Individuals that recover from the illness each day.

Let’s define the properties of a virus, and then see how this would escalate over a few days:

  • Reproductive Number = 2.0
  • Recover Time = 5 Days
  • Daily Reproductive Number = 2/5 = 40%
  • Daily Recovery Number = 1/5 = 20%

Day One: We Start with 100 people infected.

Day Two: Out of the 100 people infected we would expect that 40% of people would infect a new person (+40 Infected), and 20% would recover (-20 Infected).

Day Two: Out of the 120 people infected we would expect that 40% of people would infect a new person (+48 Infected), and 20% would recover (-24 Infected).

This Visualized looks like this:

We need to start thinking about the way our bodies recover from infections, and how people in society interact.

This is the most important fact that influences epidemiology:

When a person recovers from a virus, they build antibodies that make it so they will never get infected again.

For this reason, a virus can only spread when an infected person comes into contact with an individual who has not yet been exposed to the virus.

For modeling purposes, we will call these individuals Susceptible (S)

This effectively splits our Population (N) into three categories.

  • Susceptible Individuals (S)
  • Infected Individuals (I)
  • Recovered Individuals (R)

At all Days (t), this holds true:

As time goes on, people go from Susceptible to Infected to Recovered.

With a fixed population, this means over time the number of individuals who are susceptible to an infection decrease.

Think of a virus as a fire, and the Susceptible Population as the firewood

Once the firewood runs out, the fire is extinguished. The virus cannot spread if it doesn’t have anyone to infect.

Think back to our example we had 40% of the Infected Individuals successfully infect a person each day. Of our population size of 100, this is essentially 40 people infecting 40 people.

Consider these 40 infections as interactions.

Possible Interactions:

  1. Infected Person interacts with Recovered Person:
Infected interacts with Recovered

2. Infected Person interacts with Infected Person:

Infected interacts with Infected

3. Infected interacts with Susceptible:

Successful Interaction

Of those 40 interactions it is possible that some of these will fall into the first 2 unsuccessful categories. A recovered person cannot get the virus, and an infected person can’t get the virus again.

Using this, we can define the % of Infectious Interactions at each day as as percentage of Susceptible People left in the population.

Think of it as an infected person talking to a random person each day. The probability of this person being susceptible is the proportion of susceptible people left.

For example, if only 10% of the population remains susceptible we would now only expect 10% of the interactions to be successful.

Now our model of infections changes. As the number of susceptible people decrease, that rate of new infections each day would also decrease.

Infections over time

Using these properties, lets start to think of a timeline:

At the start of an epidemic, the percentage of Susceptible Individuals in the population is very high. This would increase the rate of infection as more of the interactions from the Infected would be with susceptible individuals. At this point, the number of Recovered Individuals is very low as not many have been exposed.

In the middle of an epidemic, the percentage of Susceptible Individuals in the population starts to get lower. Soon enough the Newly Recovered individuals each day becomes greater than the Newly Infected ones, and the number Infected starts to decrease.

At the end of an epidemic, the Number of Recovered (R) starts to level off near the Population Size (N) as everyone has had the virus, and recovered. The Number of Infected and Number of Susceptible (S) starts to level off near 0 as no more people are susceptible to infection.

If you want to see more of the math behind this model, please read my other article that goes into it in more detail here

Lets define a population and virus, and run a real simulation:

  • Total Population of 10,000
  • 100 are Infected at the start
  • Reproductive Number = 2.0
  • Recovery Time = 10 Days
Complete SIR Model

if you want to play around with these models, you can find the necessary code in my GitHub

This is what epidemiologists call the SIR Model, and what they use to predict the spread of a virus over time.

In 2003 the world saw the dawn of a new coronavirus called SARS.

This is how it played out:

Look familiar?

Forecasting the spread of a virus is easy. All it takes is knowing the average Reproductive Number of a virus, the normal Recovery Time, and the Population Size and we can make detailed forecasts on what will happen as a virus moves through society.

As for the impacts of societal changes to our SIR model, I wrote a second piece on how personal hygiene, mandated isolation, and treatment/vaccines effect the SIR model. You can read it here

Thanks for reading!

--

--

Jordan Crosby
Analytics Vidhya

interested in automation and data and epidemeology. learn more about me at jcros.me