Making a map of COVID-19 incidence in Switzerland using ggplot2 and sf

Giulia Ruggeri
EPFL Extension School
6 min readOct 27, 2020

--

In the past years, creating beautiful maps in R has become fairly simple, thanks to the {sf} package. In this article we are going to visualise the spatial distribution of COVID-19 incidence over a 14-day period in Switzerland, by creating a thematic map — also known as choropleth map. Previously, we had explored how to visualise and animate COVID-19 time series data, using{ggplot2} and {gganimate}.

This time, we are going to rely on {sf} and {ggplot2} as our main tools.

{sf}, which stands for simple feature, is the go-to library to deal with spatial vectorial data, which is data that describes geographical geometries as a series of points, specified by their longitude and the latitude coordinates. It allows for importing, manipulating and plotting geographical shapes, letting us deal with data in table-like format, just like a data.frame. Such a relief!

In this little exercise, we are using {readxl} to import the excel file, downloaded from the Swiss Federal Office of Public Health Website.{rcartocolor} is the R library that includes nice looking colour scales, which have been developed for cartography. As David Letterman would say,{tidyverse} needs no introduction.

Let’s start loading the data, using read_excel(), in which we can define the exact name of the sheet that we want to load, how many lines we may skip and how many lines we want to keep overall. We have one row per canton and a row for the title, which means that we need to keep only 27 rows (the Swiss cantons are 26).

We also clean up the column names a bit using the clean_names from the{janitor} package, and then use transmute to rename the wanted columns and drop the others.

## # A tibble: 6 x 2
## canton incidence
## <chr> <dbl>
## 1 AG 45.1
## 2 AI 99.1
## 3 AR 92.3
## 4 BE 73
## 5 BL 33
## 6 BS 47.7

Until now, simple data import and manipulation.

We now have one variable that contains the canton codes, and one variable that contains the incidence per 100'000, by canton, of COVID-19 cases in the last 14 days.

We are ready to load the shapefiles.

Wait, what are the shapefiles?

Shapefiles are the files that contain the geographical shapes we want to plot. We want to plot the cantonal data, so we need to get the Swiss cantons’ shapes, which can be downloaded from here.

Shapefiles are actually a set of files, which contain different geographic infos (e.g. info on the projections). One of these files has the .shp extension and this is the one we are going to load.

Beware that you need to have all the other files in the same folder.

Now we can therefore load 2 shapefiles, one that contains the shapes of the canton borders and one that contains the shapes of the major lakes of Switzerland.

Let’s load them and see how they look like.

class(swiss_cantons)## [1] "sf" "data.frame"

swiss_cantons and swiss_lakes, are stored as sf data.frames, so we can manipulate them just like we manipulate tibbles (or data.frames). This is possible because geometries are stored in a very tidy way: as a nested variable usually called geometry. This will be your only special variable, the others (that are called attributes) will just be normal variables. For instance, each canton has its name and code associated to the geometry that describes it.

```rhead(Swiss_cantons)```## Simple feature collection with 6 features and 3 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 546871 ymin: 130593 xmax: 768722 ymax:
## projected CRS: CH1903 / LV03
## KT NAME KURZ geometry
## 1 17 St. Gallen SG MULTIPOLYGON (((738559 1968...
## 2 12 Basel-Stadt BS MULTIPOLYGON (((608728 2681...
## 3 7 Nidwalden NW MULTIPOLYGON (((671030 1822...
## 4 2 Bern BE MULTIPOLYGON (((572954 1936...
## 5 14 Schaffhausen SH MULTIPOLYGON (((684561 2726...
## 6 10 Fribourg FR MULTIPOLYGON (((584435 1976...

The advantage of using {sf} as our main tool to deal with these data types? We can use{ggplot2} to plot it!

And, just like with any {ggplot2} chart, we can reason layer by layer when building map. Let's now add the Swiss lakes on top of the canton shapes and use theme_void() to remove the background and the axis. In this step, we can also make the cantons transparent by setting the fill argument to NA and add a light teal colour fill the lakes. geom_sf() indeed works just like any other geom_ function, no alarms and no surprises here.

Now, how do we colour each canton by the magnitude of COVID-19 incidence per 100'000 people?

We just need to join the covid_incidence tibble and swiss_cantons table together, using the canton code as joining variable. This will allow us to map the variable incidence to the fill aesthetic and create a choropleth map, i.e. a thematic map.

To make our map look pretty, instead of using a numerical variable we divide the incidence into categories, so that it will be easier for the user to see in which category each canton is.

This a typical practice for choropleth maps, and it can be done in different ways, in this case we decide for a brut-force approach, we do it manually.

Now we can map the colour to the incidence_cat variable and create the first choropleth map.

And we have our first choropleth map, built only using {sf} and{ggplot2}. Let’s add some finishing touches: we are not happy with how the legend looks and we can change it using guide_legend().

Now we can add a title, a subtitle and labels on top of each canton. We will use {ggrepel} to make sure the labels will not overlap with each other, and we will also use{ggtext} so we can use markdown syntax for our title and subtitle.

We now have the code to create a choropleth map and we have seen how to build it step by step using {ggplot2}. With a little bit of customisation we have a static map that we can save in different formats and share.

If you are interested in how to deal with geographical data, one of the best freely available resources is the geocomputation with R book. The authors of the book rely heavily on different packages for plotting thematic maps, namely{tmap}, which is also worth exploring. If, however, you want to use packages such as{ggtext} to customise your plots, {ggplot2} is the library you want to rely on, especially if you are already used to working with it.

I hope you enjoyed this article and stay tuned for more examples on how to build maps in R.

Originally published at https://github.com.

--

--

Giulia Ruggeri
EPFL Extension School

Senior Data Science Educator at the EPFL Extension Schools, with a background in air pollution and public health.