Use R and gganimate to make an animated map of European students and their year-abroad.

Johannes Müller
5 min readMay 30, 2019

--

Last weekend 400 million people were asked to vote a new European pairlament. For this occasion I was wondering if there is a way to use data visualisation to create an emotional picture of what the EU means to me and many of my friends. I came up with this: A map that follows 1000 randomly chosen Erasmus students and their path from home university to host university in a different country.

This blogpost I structured in three parts and follows my journey that I went through to create this visualisation. 1) Find and preprocess the data. 2) Make a static visualisation of the connections on a map. 3) add animation.

The data

First of all I needed individual level data on where European students go for their erasmus semester. The EU used to publish anonymized datasets of all Erasmus students. Unfortunatly, the last available wave is from 2014 — but it is something we can work with. The dataset can be downloaded here.

The dataset with 272.497 observations has the following relevant features:

Unfortunatly, we there are no city names for sending and hosting university in the dataset — and of course no latlon-variables. We only have the ErasmusIDs of the universities, so I had to find another dataset with the Erasmus IDs, which I found here. This dataset gave me the exact adress for each university which allowed me to geocode the data. To geocode the data I pass city and country to the geocode function from the ggmap package.

library(readxl)
library(ggplot2)
library(maps)
library(rgeos)
library(maptools)
library(ggmap)
library(geosphere)
library(dplyr)
library(gganimate)
#load data
data_raw <- read_excel("Student_Mobility_2013-14.xlsx")#all students
data_institutions <- read_excel("accredited_heis_within_the_erasmus_programme_15112017xlsx.xlsx")
# geocode institutions
data_institutions$lon <- rep(NA, nrow(data_institutions))
data_institutions$lat <- rep(NA, nrow(data_institutions))
for(i in 1:nrow(data_institutions)){
geo <- geocode(paste0(data_institutions$City[i],", ",data_institutions$Country[i]), "latlon",source="dsk")
data_institutions$lon[i] <- geo[1]
data_institutions$lat[i] <- geo[2]
}

Then I had to tidy up the data a bit and create a dyadic dataset with all connections between universties and merge the latlon infomation (it ain’t pretty but it works).

data <- data_raw%>%
group_by(SendingPartnerErasmusID, HostingPartnerErasmusID) %>%
summarise(SendingCountry = first(SendingCountry),
SendingPartnerName=first(SendingPartnerName),
HostingPartnerName=first(HostingPartnerName),
HostingPartnerCity=first(HostingPartnerCity),
total_number = n(),
lon1=NA,lat1=NA,lon2=NA,lat2=NA)
# merge lotlan to data

for(i in 1:nrow(data)){
if(data$SendingPartnerErasmusID[i] %in% data_institutions$`Erasmus Code`){ # check if sending city is geocoded
sending_index <- which(data_institutions$`Erasmus Code`== data$SendingPartnerErasmusID[i])
data$lon1[i] <- unlist(data_institutions$lon[sending_index])
data$lat1[i] <- unlist(data_institutions$lat[sending_index])
}
if(data$HostingPartnerErasmusID[i] %in% data_institutions$`Erasmus Code`){ # check if sending city is geocoded
hosting_index <- which(data_institutions$`Erasmus Code`== data$HostingPartnerErasmusID[i])
data$lon2[i] <- unlist(data_institutions$lon[hosting_index])
data$lat2[i] <- unlist(data_institutions$lat[hosting_index])
}
}

This leaves us with a beautiful dyadic, geocoded dataset:

2. Make the static visualisation

Now it’s time to get started on the visualisation. The inspiration as well most of the code come from this beautiful tutorial. First, we need a map of Europe as the background for out visualisation. For this, I use the shapefile provided in the ggmap package of the whole world and then specify the coordinates of Europe on the map with coord_cartesian(xlim = c(-25,45), ylim = c(32,70)).

worldmap <- map_data("world")#, region = europeanUnion)
wrld<-c(geom_polygon(aes(long,lat,group=group),
size = 0.1,
colour= "#090D2A",#090D2A",
fill="#090D2A", alpha=0.8, data=worldmap))

ggplot() +
wrld +
theme(panel.background =
element_rect(fill='#00001C',colour='#00001C'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
coord_cartesian(xlim = c(-25,45), ylim = c(32,70))+
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank())

No that we have the background we can work on the lines that connect universities.

As the picture gets quite crowded if you plot all 250.000+ individual connections I decided to select 1000 connections randomly and only include the ones where the sending country is in the European union (last checked: May 2019).

countries_to_plot <- c("LI", "MT", "LU", "HR", "EE", "SI", "LV", "BG","IE", "DK", "SK", "LT", "SE", "GR", "HU","RO", "AT", "FI", "PT", "CZ", "NL", "BE", "GB", "PL", "IT", "FR", "ES", "DE")data_plot <- data%>%
filter(!is.na(lat1),
!is.na(lat2),
!is.na(lon1),
!is.na(lon2),
SendingCountry %in% countries_to_plot)
data_plot <- data_plot[sample(nrow(data_plot), 1000), ]

Now, to draw the lines, I use gcIntermediate to draw points between the two universities (this code comes directly from the tutorial).

#helper
fortify.SpatialLinesDataFrame <- function(model, data, ...){
ldply(model@lines, fortify)
}
# calculate routes for each row
routes <- gcIntermediate(data_plot[,c('lon1', 'lat1')], data_plot[,c('lon2', 'lat2')], 15, breakAtDateLine = F, addStartEnd = T, sp=TRUE)
# fortify to dataframe
fortifiedroutes <- fortify.SpatialLinesDataFrame(routes)
# merge to form circles
routes_count <- data.frame('count'=data_plot$total_number, 'id'=1:nrow(data_plot), 'Countries'=data_plot$SendingCountry)
greatcircles <- merge(fortifiedroutes, routes_count, all.x=T, by='id')

Before I can plot the data, I wanted to add different shades of yellow for different countries to make it a bit more interesting.

colors_new <- c("#FFFFFF", "#FEFFF6","#FEFFED","#FEFFE4","#FEFFDB","#FEFFD2",
"#FEFFC9", "#FDFFC0","#FDFFB7","#FDFFAF", "#FDFFA6","#FDFF9D","#FDFF94","#FDFF8B","#FCFF82", "#FCFF79","#FCFF70","#FCFF67","#FCFF5F","#FCFF56",
"#FCFF4D","#FBFF44","#FBFF3B","#FBFF32","#FBFF29","#FBFF20",
"#FBFF17","#FBFF0F")

And then we can put the map created above together with the plotted lines of our connections:

plot <- 
ggplot() +
wrld +
geom_line(aes(long,lat,group=id, color=Countries), alpha=0.3,
size=0.01, data= greatcircles) +
theme(axis.line=element_blank(),axis.text.x=element_blank(),
axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank())+
scale_colour_manual(values=colors_new)+
theme(panel.background = element_rect(fill='#00001C',
colour='#00001C'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
coord_cartesian(xlim = c(-20,35), ylim = c(35,70))+
theme(legend.position = "none")

3. Animation

Now for the last part: The animation. This part is suprisingly easy to implement — thanks to the great gganimate package.

All that is needed is to add one line to our graph:

anim <- plot + 
transition_reveal(greatcircles$order)

It is straightforward in this case because the gcIntermediate function extrapolates points between sending and hosting location. In the final data.frame the order of the points can be found in greatcircles$order. We can now use this variable to tell the transition_reveal function from gganimate in which order it should reveal the individual points.

There is only one problem remaining: Now all the lines are plotted at the same time, making a crowded plot even messier. That’s why I decided to reveal the lines by country. I achieve this by adding a delay for each country:

#add delay for all countries
add_delay <- 0
for(i in countries_to_plot){
greatcircles$order[greatcircles$Countries==i] <-
greatcircles$order[greatcircles$Countries==i]+add_delay
add_delay = add_delay+10
}

And ét voila: We can now run the lines above again and subsequently save it as a gif.

animate(anim, fps = 20, width = 1024, height = 951, nframes = 640, end_pause = 40)
anim_save("europe.gif")

Wrap-Up

The animation and the plot itself are not the most informative data visualisations. However, with the code you can think about all sorts of different variations (which are more informative): Where do students from Germany/France/etc. go to? Is there a reciprocity between certain universities? What are the most popular cities/countries?

--

--