How to visualize the spread of COVID-19 in Italy

Alessandro Paticchio
Polimi Data Scientists
4 min readApr 7, 2020
Photo by James Yarema

Introduction

February 21st 2020. 16 Coronavirus cases were discovered in Lombardia, one of the crucial crossroads of Italy, a region where everyday thousands of people land and leave. The day after, the infected people were 60 and in the following days the first deaths were sadly signaled. In the same days, cases started appearing in Veneto too. On the 23rd, hoping to contain the spread of the disease, the government declared part of the province of Lodi and the town of Vo’ Euganeo (in Veneto) as zona rossa. Unfortunately, it was already too late: the virus had already started traveling all around Italy (…and the rest of the world). But how? Where? How fast? To answer these questions, it would be useful to visualize the evolution of the spread.

In the following, I will outline an intuitive way to describe and display how the virus’ situation evolved in Italy, by firstly illustrating the tools that I leveraged and then showing how to take advantage of them. In particular, in this article we will see:

  1. What choropleth maps and GeoJSON are
  2. Useful resources you can leverage to explore the case
  3. How to generate a choropleth maps to visualize the virus spread

Choropleth Maps and GeoJSON

Choropleth Maps are maps composed of polygons, where the color of each polygon expresses a quantity. The polygons usually represent real geographic areas, hence real maps, and therefore these visualizations are extremely intuitive when you want to describe, for instance, the demographic distribution of a population.

To generate these maps, however, you need the structure of the polygons. This can be described by the use of GeoJSON, an open standard format designed for representing simple geographical features, along with their non-spatial attributes.

# Example of GeoJSON
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [125.6, 10.1]
},
"properties": {
"name": "Dinagat Islands"
}
}

Given these two main concepts, we will now see how to easily exploit them to visualize the spread of COVID-19 in Italy.

Visualization made easy with Plotly

Plotly is a Python library that provides online graphing, analytics, and statistics tools for individuals and collaboration. Among the other features, it allows you to build your own choropleth map easily (documentation here) and with few lines of code. In particular, we will visualize the spread at province level. Let’s see how to do it in our specific case.

The data I am using are made publicly available by the Italian Protezione Civile, the department that coordinates the population’s defense and protection, in this GitHub repository, updated on a daily basis.

# Import libraries and read data 
from urllib.request import urlopen
import pandas as pd
import plotly.express as px
import json
# Read Data
df = pd.read_csv("https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-province/dpc-covid19-ita-province.csv")

Let’s see the dataset structure.

df.head()
Result of df.head()

The columns we need are just:

  1. ‘data’, which will allow us to see how the virus spreads over time.
  2. ‘codice_provincia’, the code of the province.
  3. ‘totale_casi’, the total number of cases, each day.

Furthermore, we see there are few rows that have the value “In fase di definizione/aggiornamento” in the column “denominazione_provincia”. For the sake of visualization, we will drop them.

# Drop useless colums
df.drop(columns=['stato', 'codice_regione', 'denominazione_regione','sigla_provincia', 'lat', 'long', 'note_it', 'note_en'], axis=1, inplace=True)
# Drop useless rows
df = df[df['denominazione_provincia'] != 'In fase di definizione/aggiornamento']
# Columns manipulation
df = df.rename(columns={'codice_provincia': 'Province'})
df = df.rename(columns={'totale_casi': 'Total Cases'})
df['Date'] = pd.to_datetime(df['data'], format="%Y-%m-%d")
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')

The GeoJSON that we will use is kindly provided by this GitHub repository and can be quickly downloaded or linked in the generation of the map, which we are ready to create!

fig = px.choropleth(df, geojson='https://raw.githubusercontent.com/openpolis/geojson-italy/master/geojson/limits_IT_provinces.geojson', locations='Province', color='Total Cases', color_continuous_scale='Reds', featureidkey='properties.prov_istat_code_num', animation_frame='Date', range_color=(0, max(df['Total Cases'])))
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.update_geos(fitbounds="locations", visible=False)
fig.write_html('try.html')

As you can see, this simple code just assigns a few function parameters to the desired dataframe columns. Notice that we specified the parameter featureidkey to match the key prov_name in the properties field of our GeoJSON. Furthermore, we specified the range_color between 0 and the maximum of the column “Total Cases”, but this can be adapted to your needs and match other ranges. Finally, the last command will save the result in an html page, that, when opened, will look like this:

Animation of the spread of COVID-19 in Italy, click Play to visualize

As expected, the most critical regions are in the North Italy. However, also the southern regions are becoming more red

This is just a basic visualization of what is happening in Italy. Anyway, with the data at our disposal, you can do your own analysis and try to answer to a lot of other questions: which is the region where people are recovering faster? Which is the region with the highest jump in the number of infected? Which is the region with the higher number of tests with respect to the total population?

This blogpost is published by the PoliMi Data Scientists association. We are a community of students of Politecnico di Milano that organizes events and write resources on Data Science and Machine Learning topics.

If you have suggestions or you want to come in contact with us, you can write to us on our Facebook page.

--

--

Alessandro Paticchio
Polimi Data Scientists

ML Engineer @ Casavo | Graduate @ Polimi | Former Research Fellow @ Harvard. Former Vice President of Polimi Data Scientists.