Covid19 transmission forecast in Italy — a python tutorial for SIR model

Andrea Castiglioni
Analytics Vidhya
Published in
5 min readMar 12, 2020

In the previous part we saw the data analysis of covid-19 diffusion in Italy for the first 12 days.

In this part we will implement a better algorithm to fit the data, the so called SIR model. The SIR model is one of the simplest compartmental models, and many models are derivations of this basic form. The model consists of three compartments: S for the number of susceptible, I for the number of infectious, and R for the number recovered (or immune) individuals. This model is reasonably predictive for infectious diseases which are transmitted from human to human, and where recovery confers lasting resistance, such as measles, mumps and rubella.

The basics of SIR model are the three states and the rates between them:

SIR Model.

For COVID-19 the diffusion medium is Airborne droplet and experts extimated an R0 of 1.4–3.9. The basic reproduction number R0 of an infection can be thought of as the expected number of cases directly generated by one case in a population where all individuals are susceptible to infection.

SIR model

A simple mathematical description of the spread of a disease in a population is the so-called SIR model, which divides the (fixed) population of N individuals into three “compartments” which may vary as a function of time, t:

  • S(t) are those susceptible but not yet infected with the disease;
  • I(t) is the number of infectious individuals;
  • R(t) are those individuals who have recovered from the disease and now have immunity to it. The SIR model describes the change in the population of each of these compartments in terms of two parameters, β and γ. β describes the effective contact rate of the disease: an infected individual comes into contact with βN other individuals per unit time (of which the fraction that are susceptible to contracting the disease is S/N). γ is the mean recovery rate: that is, 1/γ is the mean period of time during which an infected individual can pass it on.

The differential equations describing this model were first derived by Kermack and McKendrick [Proc. R. Soc. A, 115, 772 (1927)]:

set of equations for SIR model to be implemented in python

We will try to first fit the model to our data, then to extrapolate what happens without countermeasures. Finally we will try to simulate a case in which there’s a strong reduction in the reproduction number.

The situation

At the time of speaking, Covid-19 already hit some 12k people in Italy, most of them from Lombardy.

Total number of cases for each category of Covid-19 in Italy at 12/03/2020.
Diffusion of Covid-19 in Italy at 12/03/2019

The code

we import the usual standard libraries plus one from scipy to solve our differential equation.

import numpy as np
import random
import pandas as pd
from scipy.integrate import odeint
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.dates as mdates

The mathematical model is then given for example as exercise here:

# Total population, N.
N = 10e6
# Initial number of infected and recovered individuals, I0 and R0.
I0, R0 = 1, 0
# Everyone else, S0, is susceptible to infection initially.
S0 = N - I0 - R0
# Contact rate, beta, and mean recovery rate, gamma, (in 1/days).
beta, gamma = 0.35, 1./10
# A grid of time points (in days)
t = np.linspace(0, 90, 91)
# The SIR model differential equations.
def deriv(y, t, N, beta, gamma):
S, I, R = y
dSdt = -beta * S * I / N
dIdt = beta * S * I / N - gamma * I
dRdt = gamma * I
return dSdt, dIdt, dRdt
# Initial conditions vector
y0 = S0, I0, R0
# Integrate the SIR equations over the time grid, t.
ret = odeint(deriv, y0, t, args=(N, beta, gamma))
S, I, R = ret.T

So for an initial population of 10 millions, 1 infected person, a beta factor of 0.35 and a gamma of 0.1 (meaning a R0 of 3.5) the disease takes around 120 days to fully outbreak and disappear.

Best fit to Italy case

But what is the real situation?

For what concerns the recovery rate, we can estimate it from the available data: in the plot below we see the trends in log scale between the total cases and the recovered ones. In the limited time range considered they can be thought of parallel lines. The distance between them is approximatively 9 days.

  • γ is then 1/9
  • β we can use literature data about R0 and fit it to the real data
  • N is the Italian population for our case: 60 millions people
  • N0 is the starting point of our public data: 220 people

Below we can see the best fit of the SIR model to the experimental data: a β of 0.37 seems a good fit to the overall range. From the log plot on the right it seems like in the first days the spread acted faster while on the second period it slowed down.

We can address this behaviour to the countermeasures put in place by the government to contain the infection.

What would be the virus propagation without any isolation effect with those parameters? According to the SIR model we should expect a peak around 60 days with about 3 Million people infected at the same time. This scenario poses serious risks to our ability to take care of people who needs intensive care (at the moment of writing about 8% of them)

However, if we are able to half the diffusion ability of the virus, we can spread its diffusion longer in time.

How to change it without a vaccine? Well, isolate infective people, avoid contacts are all well known systems which works pretty well since the beginning of time!
What happens if the situation changes abrutly by halving R0 and isolate as much as possible?
Well as we can see below the initial orange trend become the green one and the whole outbreak becomes clearly more manegeable!

--

--