Step-by-Step Choropleth Map in R: A case of mapping Nepal
TL;DR
If you are in a hurry and want to quickly get the map running, then the above gist helps. But if you want to tweak here and there, there’s no shortcuts than to understand and run the code step-by-step. The following is the process of getting the above map but slowly.
Setup
First install the necessary packages if you don’t have them already; and load the packages
library(rgdal)
library(ggplot2)
library(dplyr)
Loading data
Clone the Nepal shapefiles from Github and run the following codes to read shapefiles and convert to dataframe (using fortify). Read how it works here.
nepal.adm3.shp <- readOGR(dsn="./NepalMaps/baselayers/NPL_adm", layer="NPL_adm3", stringsAsFactors = FALSE)
nepal.adm3.shp.df <- fortify(nepal.adm3.shp, region = "NAME_3")
Preparing raw plots
Now you have data to do the plotting using ggplot2. First save the plot to the map variable for reuse.
map <- ggplot(data = nepal.adm3.shp.df, aes(x = long, y = lat, group = group))
Lets plot the path first using
map + geom_path()
Lets do the polygon plotting now. Remove coord_fixed(..)
and guides(..)
and see what it does to your plot
map +
geom_polygon(aes(fill = id)) +
coord_fixed(1.3) +
guides(fill = FALSE)
See the outputs from the above two commands.
Preparing data for Choropleth
Now that we have prepared map, lets plot the choropleth map. The shapefiles contains districts names which might be different from what you have in your external data files. For ease, we will now prepare districts data from the shapefiles data frame and write data to that file for plotting.
nepal.adm3.shp.df %>%
distinct(id) %>%
write.csv("districts.csv", row.names = FALSE)
I used the Human Poverty Index data from OpenNepal portal. I filtered the data for HPI, and ensured that the districts names match to the names present in the shapefiles in newly created districts.csv. I renamed the columns of dataframe, containing district name and HPI.
hpi.data <- read.csv("districts.csv")
colnames(hpi.data) <- c("id","HPI")
Now existing shapefiles dataframe and HPI data containing HPI index and districts are merged using district-name (which is identified by column name id
)
nepal.adm3.shp.df <- merge(nepal.adm3.shp.df, hpi.data, by ="id")
Drawing Choropleth map
Since dataframe nepal.adm3.shp.df
is now updated (with extra HPI
column), lets re-create the map plot variable again.
map <- ggplot(data = nepal.adm3.shp.df, aes(x = long, y = lat, group = group))
Run the following code to create polygon map, using HPI
to fill the polygons.
map +
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
coord_fixed(1.3)
Now we get good looking choropleth map, with HPI values represented by the intensity of colors.
However there are still issues, lets change the gradient so that the high value of HPI (40+) is represented by dark colors and low value (20-) by light colors.
Changing the gradient scale
Add scale_fill_gradient(..)
and give high and low colors. Play with the colors to see how the colors are changed in the map.
map +
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3)
Changing Legend Title
Now change the legend title from default to something you want.
map +
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
guides(fill=guide_colorbar(title="HP Index"))
Moving the Legend inside chart
There’s ample space inside the chart, why don’t we move the legend inside the chart.
map +
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
guides(fill=guide_colorbar(title="HP Index")) +
theme(legend.justification=c(0,0), legend.position=c(0,0))
Creating clean map
The background grid and lat-long axis are not necessary for the map. Lets remove them.
theme_bare <- theme(
axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.text=element_text(size=7),
legend.title=element_text(size=8),
panel.background = element_blank(),
panel.border = element_rect(colour = "gray", fill=NA, size=0.5)
)map +
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
guides(fill=guide_colorbar(title="HP Index")) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
theme(legend.justification=c(0,0), legend.position=c(0,0)) +
theme_bare
Adding labels for each district polygon
Now we need to show districts names in the map. Centroids calculation is bit tricky, but it works if you follow this code from stackoverflow.
centroids <- setNames(do.call("rbind.data.frame", by(nepal.adm3.shp.df, nepal.adm3.shp.df$group, function(x) {Polygon(x[c('long', 'lat')])@labpt})), c('long', 'lat')) centroids$label <- nepal.adm3.shp.df$id[match(rownames(centroids), nepal.adm3.shp.df$group)]map +
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
guides(fill=guide_colorbar(title="HP Index")) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
theme(legend.justification=c(0,0), legend.position=c(0,0)) +
with(centroids, annotate(geom="text", x = long, y = lat, label=label, size=2)) +
theme_bare
Showing labels for selected districts only
The map looks cluttered. I am interested in districts which passes certain criteria. Lets show the names for districts for which HPI is greater than 40.
centroids.selected <- centroids[centroids$label %in% (hpi.data[hpi.data$HPI>40,]$id),]
map +
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
guides(fill=guide_colorbar(title="HP Index")) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
theme(legend.justification=c(0,0), legend.position=c(0,0)) +
with(centroids.selected, annotate(geom="text", x = long, y = lat, label=label, size=2)) +
theme_bare
Adding title to the map
This is the easiest of all. ggtitle(..)
does the job.
map +
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
ggtitle("Human Poverty Index Map") +
guides(fill=guide_colorbar(title="HP Index")) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
theme(legend.justification=c(0,-0.1), legend.position=c(0.05,0)) +
with(centroids.selected, annotate(geom="text", x = long, y = lat, label=label, size=2)) +
theme_bare
Again if you are in a hurry and you quickly scrolled down here without reading TL;DR and you just need to get the choropleth map, get the shapefiles and run the gist from here.