How to visualize the spread of COVID-19 in Italy

Photo by James Yarema

Introduction

February 21st 2020. 16 Coronavirus cases were discovered in Lombardia, one of the crucial crossroads of Italy, a region where everyday thousands of people land and leave. The day after, the infected people were 60 and in the following days the first deaths were sadly signaled. In the same days, cases started appearing in Veneto too. On the 23rd, hoping to contain the spread of the disease, the government declared part of the province of Lodi and the town of Vo’ Euganeo (in Veneto) as zona rossa. Unfortunately, it was already too late: the virus had already started traveling all around Italy (…and the rest of the world). But how? Where? How fast? To answer these questions, it would be useful to visualize the evolution of the spread.

In the following, I will outline an intuitive way to describe and display how the virus’ situation evolved in Italy, by firstly illustrating the tools that I leveraged and then showing how to take advantage of them. In particular, in this article we will see:

  1. What choropleth maps and GeoJSON are

Choropleth Maps and GeoJSON

Choropleth Maps are maps composed of polygons, where the color of each polygon expresses a quantity. The polygons usually represent real geographic areas, hence real maps, and therefore these visualizations are extremely intuitive when you want to describe, for instance, the demographic distribution of a population.

To generate these maps, however, you need the structure of the polygons. This can be described by the use of GeoJSON, an open standard format designed for representing simple geographical features, along with their non-spatial attributes.

# Example of GeoJSON
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [125.6, 10.1]
},
"properties": {
"name": "Dinagat Islands"
}
}

Given these two main concepts, we will now see how to easily exploit them to visualize the spread of COVID-19 in Italy.

Visualization made easy with Plotly

Plotly is a Python library that provides online graphing, analytics, and statistics tools for individuals and collaboration. Among the other features, it allows you to build your own choropleth map easily (documentation here) and with few lines of code. In particular, we will visualize the spread at province level. Let’s see how to do it in our specific case.

The data I am using are made publicly available by the Italian Protezione Civile, the department that coordinates the population’s defense and protection, in this GitHub repository, updated on a daily basis.

# Import libraries and read data 
from urllib.request import urlopen
import pandas as pd
import plotly.express as px
import json
# Read Data
df = pd.read_csv("https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-province/dpc-covid19-ita-province.csv")

Let’s see the dataset structure.

df.head()
Result of df.head()

The columns we need are just:

  1. ‘data’, which will allow us to see how the virus spreads over time.

Furthermore, we see there are few rows that have the value “In fase di definizione/aggiornamento” in the column “denominazione_provincia”. For the sake of visualization, we will drop them.

# Drop useless colums
df.drop(columns=['stato', 'codice_regione', 'denominazione_regione','sigla_provincia', 'lat', 'long', 'note_it', 'note_en'], axis=1, inplace=True)
# Drop useless rows
df = df[df['denominazione_provincia'] != 'In fase di definizione/aggiornamento']
# Columns manipulation
df = df.rename(columns={'codice_provincia': 'Province'})
df = df.rename(columns={'totale_casi': 'Total Cases'})
df['Date'] = pd.to_datetime(df['data'], format="%Y-%m-%d")
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')

The GeoJSON that we will use is kindly provided by this GitHub repository and can be quickly downloaded or linked in the generation of the map, which we are ready to create!

fig = px.choropleth(df, geojson='https://raw.githubusercontent.com/openpolis/geojson-italy/master/geojson/limits_IT_provinces.geojson', locations='Province', color='Total Cases', color_continuous_scale='Reds', featureidkey='properties.prov_istat_code_num', animation_frame='Date', range_color=(0, max(df['Total Cases'])))
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.update_geos(fitbounds="locations", visible=False)
fig.write_html('try.html')

As you can see, this simple code just assigns a few function parameters to the desired dataframe columns. Notice that we specified the parameter featureidkey to match the key prov_name in the properties field of our GeoJSON. Furthermore, we specified the range_color between 0 and the maximum of the column “Total Cases”, but this can be adapted to your needs and match other ranges. Finally, the last command will save the result in an html page, that, when opened, will look like this:

Animation of the spread of COVID-19 in Italy, click Play to visualize

As expected, the most critical regions are in the North Italy. However, also the southern regions are becoming more red

This is just a basic visualization of what is happening in Italy. Anyway, with the data at our disposal, you can do your own analysis and try to answer to a lot of other questions: which is the region where people are recovering faster? Which is the region with the highest jump in the number of infected? Which is the region with the higher number of tests with respect to the total population?

This blogpost is published by the PoliMi Data Scientists association. We are a community of students of Politecnico di Milano that organizes events and write resources on Data Science and Machine Learning topics.

If you have suggestions or you want to come in contact with us, you can write to us on our Facebook page.

Polimi Data Scientists

This page’s aim is to create an environment for data…

Alessandro Paticchio

Written by

Computer Science and Engineering Student @ Polimi | Research Fellow @ Harvard. Former Vice President of Polimi Data Scientists.

Polimi Data Scientists

This page’s aim is to create an environment for data science students, enthusiasts, and alumni from Politecnico di Milano. This will be a place of culture, experiences and ideas exchange, related to data science fields. Feel free to ask and contribute to the community.

Alessandro Paticchio

Written by

Computer Science and Engineering Student @ Polimi | Research Fellow @ Harvard. Former Vice President of Polimi Data Scientists.

Polimi Data Scientists

This page’s aim is to create an environment for data science students, enthusiasts, and alumni from Politecnico di Milano. This will be a place of culture, experiences and ideas exchange, related to data science fields. Feel free to ask and contribute to the community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store