Covid-19: Corona virus analysis in Italy with python and spread prediction

Andrea Castiglioni
8 min readMar 11, 2020

--

We are all concerned about the recent outbreak of Covid-19. Italy’s government took several measures to limit the propagation of the virus within its population. We will analyze the data available from Italy government. Github is available here.

We will first look at the data available in the dataset then we will try to ask ourselves if the restriction measures are having an effect over total cases. We will also try explore with some models how the likely future will be.

The data is given in csv format. Each row rapresent 1 day of observation and in the columns we have several informations. Each new observation is updated at 18 CET (GMT+1). As usual we import the standard libraries and connect to our data:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os
import matplotlib.dates as mdates
import numpy as np
url = ‘https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-andamento-nazionale/dpc-covid19-ita-andamento-nazionale.csv'
df = pd.read_csv(url,error_bad_lines=False)

As the dataset is in italian, I renamed the columns:

hospitalized_w_s = people who got hospitalized with symptoms
IC = people hospitalized who needs Intensive Care
tot_hosp = sum of the previous two columns
home_isolation = people infected not needing care
tot_act_positives = sum of tot_hops and home_isolation
new_act_positives = people found positives from the previous day
dismissed = people recovered from the virus
dead = people dead with the virus
total_cases = sum of tot_act_positives + dismissed + dead
tampons = people who got tested for the virus

head of the dataset.

We also include some extra variables:

df['data'] = pd.to_datetime(df['data'])
# Ratio of positives over tampons
df['NC_R'] = df['new_act_positives']/df['tampons']
# Ratio of new positives over total positives
df['NP_R'] = df['new_act_positives']/df['tot_act_positives']
# Ratio of Itensive care over total cases
df['IC_R'] = df['IC']/df['tot_act_positives']
# Ratio of hospitalized over total cases
df['Hosp_R'] = df['tot_hosp']/df['tot_act_positives']
# Ratio of dead people over total cases
df['DR'] = df['dead']/df['tot_act_positives']

The basic format for graphs will be like this:

fig, ax = plt.subplots(figsize=(10,6))
sns.lineplot(x='data',y='total_cases', data=df, label='Total cases', ax=ax, lw=2)
sns.lineplot(x='data',y='tot_act_positives', data=df, label='Total act. positives', ax=ax, lw=2)
sns.lineplot(x='data',y='hospitalized_w_s', data=df, label='Hospitalized', ax=ax, lw=2)
sns.lineplot(x='data',y='IC', data=df, label='Intensive Care', ax=ax, lw=2)
plt.ylabel('Units', fontsize=14)
plt.xlabel('Date', fontsize=14)
ax.xaxis.set_major_locator(mdates.AutoDateLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m.%d'))
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.legend(fontsize=12)
plt.title('Covid-19 trendlines in Italy from 25/02/2020', fontsize=16)

In the graph below we observe the main trendlines for the virus spread in Italy since 25/02. The blue line represent the total positives cases (found). The orange line is the actual total positives present in the territory. The green line is the number of people who shows the symptoms and are hospitalized. The red one is the number of people who needs Intensive Care in the hospitals.

Number of people found positive to the virus, hospitalized and hospitalized in Intensive Care.

As we can see from the graph, there is an exponential trend in the total cases. In the subsequent analysis we will consider the total actual positives. What is important to understand, however, is that a person is positive only if he/she has been tested; the people that person interacted with in the previous days are of major concern for authorities to limit the spread of the disease. The oldest and working solution is isolation.

From the plot below we can see that as the time passes, both recovered and unfortunately dead people are increasing following the general trend, as expected.

Number of people dead and recovered from the virus.

From the plot below we can check the fraction of Intensive care and dead people with Corona virus. The ratio of Intensive care is around 8% while dead to actual total positive is increasing around 6%. Note however that this might be a misleading ratio because the meaningful parameter here should be the total number of infected people which is unknown today as it seems that most people have or had the virus in the asymptomatic or with very light symptoms.

Ratio of people in intensive care and dead over total cases

What is the mortality rate of the new coronavirus?

As pointed out in the original article from the guardian: we are calculating the dead with corona virus over the total positives. However those are only the officially confirmed cases. There are many more mild cases that do not get to hospital and are not being counted, which would bring the mortality rate significantly down.

Do we have evidence of the measures adopted by the government?

We are also interested in the measures adopted by the government: is there a change in the criteria to test new people? Are we doing the same number of tampons each day as the disease spread? Are we finding more or less positives as the time goes on?

As for the number of daily positives over tampons, excluding the first day of observation we see that there has been a slight positive trend from 27/02 to 10/03.
From this graph alone is hard to tell what is happening: doctors might test patients with clear symptoms of corona virus to save resources and time so, as the time goes on, we should expect this trend to increase.

Ratio of people daily found positive to the virus over daily tampons done.

In the meanwhile the number of tampons done in Italy in this period is continuosly increasing. The trend is linear with time up to 03.09 and then shows an increase. These numbers highlight the strong effort put in place by the national health system in monitoring the population.

Progressive number of tampons each day.

Let’s check now the ratio of new positives over total positives, as the line has spikes, we also implement a rolling mean average with a window of 5.

rolling = df['NP_R'].rolling(window=5)
rolling_mean = rolling.mean()
Ratio of new positives over total positives.

As we can see from the graph above, the trend is negative: as the time passes we are finding less positive people over the total amount of people with corona-virus.

Model

A virus will typically spread exponentially at first if there is no immunization available. Each infected person can infect multiple new people. SARS (Severe Acute Respiratory Syndrome) and Ebola are two such viruses whose impact to affected areas can be devastating. Knowing the rate at which they typically spread is important when you are trying to contain and treat an outbreak.

We can try model our infection_rate as an exponential which is nothing but the compound interest formula. This is a gross approximation.

N_t0 = df[df.data==df.data.min()]['tot_act_positives'].min()
N_tD = df[df.data==df.data.max()]['tot_act_positives'].min()
D = (df.data.max()-df.data.min()).days
#IR = infection rate, R_0
IR = np.exp(np.log(N_tD/N_t0)/D)
print(IR)
> 1.2756

In the graph below I show the real data compared with a simulation with SIR model with β=0.36. This simple model well describe the data up to 03.09. On the right the same data is shown in log y scale. We can see that, compared to a fixed β of 0.36, the first data seems to increase faster while the second part shows a slow down.

Total actual positives in Italy versus simulations with 2 values for β

Some newspapers estimated that the total available places in intensive care is around 5 thousands. We want to understand what happens with the current spread if we don’t take counter-measures.

In the graph below we monitor three cases: the actual trend with R0 = 1.29, with a blu line; the trend of the last 5 days, green line, with R0 = 1.25, and the trend for the last 3 days with R0 = 1.20.
The orange, red and brown curves are the calculated number of people who needs intensive care. The black line is the total available places.

As we can see from the intersection of the orange curve with the black one: if the government took no measures we reach the saturation of IC places by 17/03. However, with some measures this is postponed to 22/03.

Lowering the spread of the disease is of paramount importance, hence the need for strong measures by the government. By slowing down the rate of infection the national health system will be able to cope with the availability of places in intensive care which is fundamental both for people with corona virus related problems, and for ordinary people with severe illness.

Are the measures working?

In the last graph below, we show the calculation of R0 in the last periods, taking into account a span of 5 days for the formula.

IRdf = df[['data','tot_act_positives']].copy()
IRdf['tot_day5'] = IRdf['tot_act_positives'].shift(-5).shift(5)
IRdf['tot_day10'] = IRdf['tot_act_positives'].shift(-10).shift(10)
IRdf['IR_5D'] = np.exp(np.log(IRdf['tot_day5']/IRdf['tot_act_positives'].shift(5))/5)
IRdf['IR_10D'] = np.exp(np.log(IRdf['tot_day10']/IRdf['tot_act_positives'].shift(10))/10)

As we can see from both the trends, R0 is lowering and we expect to see the effect of the last urgent measures taken by the government to take place in the upcoming days.

Optimize Function with scipy.optimize

We saw how to model our epidemic with a simple exponential. Now we want to find the best fit parameters (N0 and R) for our equation with a best fit.

To do so, we import the package from scipy optimize import curve_fit, we define the function we want to optimize and we get as output the coefficients of our parameters:

Fit optimization for simple exponential formula

As we can see the best fit with free parameters N0 and R gives us two values: the order of values is the one given as argument in the function so R and n0.

From the best fit we see that the number of initial patients has been increased from the official of 220 units to 577.

--

--