Step-by-Step Choropleth Map in R: A case of mapping Nepal

Anjesh Tuladhar
5 min readApr 30, 2017

--

TL;DR

  • Download the Nepal shapefiles from Github
  • Run the following gist
  • Get the following map
Final Choropleth map

If you are in a hurry and want to quickly get the map running, then the above gist helps. But if you want to tweak here and there, there’s no shortcuts than to understand and run the code step-by-step. The following is the process of getting the above map but slowly.

Setup

First install the necessary packages if you don’t have them already; and load the packages

library(rgdal)
library(ggplot2)
library(dplyr)

Loading data

Clone the Nepal shapefiles from Github and run the following codes to read shapefiles and convert to dataframe (using fortify). Read how it works here.

nepal.adm3.shp <- readOGR(dsn="./NepalMaps/baselayers/NPL_adm", layer="NPL_adm3", stringsAsFactors = FALSE)
nepal.adm3.shp.df <- fortify(nepal.adm3.shp, region = "NAME_3")

Preparing raw plots

Now you have data to do the plotting using ggplot2. First save the plot to the map variable for reuse.

map <- ggplot(data = nepal.adm3.shp.df, aes(x = long, y = lat, group = group))

Lets plot the path first using

map + geom_path()

Lets do the polygon plotting now. Remove coord_fixed(..) and guides(..) and see what it does to your plot

map + 
geom_polygon(aes(fill = id)) +
coord_fixed(1.3) +
guides(fill = FALSE)

See the outputs from the above two commands.

geom_path( ) vs geom_polygon( )

Preparing data for Choropleth

Now that we have prepared map, lets plot the choropleth map. The shapefiles contains districts names which might be different from what you have in your external data files. For ease, we will now prepare districts data from the shapefiles data frame and write data to that file for plotting.

nepal.adm3.shp.df %>%
distinct(id) %>%
write.csv("districts.csv", row.names = FALSE)

I used the Human Poverty Index data from OpenNepal portal. I filtered the data for HPI, and ensured that the districts names match to the names present in the shapefiles in newly created districts.csv. I renamed the columns of dataframe, containing district name and HPI.

hpi.data <- read.csv("districts.csv")
colnames(hpi.data) <- c("id","HPI")

Now existing shapefiles dataframe and HPI data containing HPI index and districts are merged using district-name (which is identified by column name id)

nepal.adm3.shp.df <- merge(nepal.adm3.shp.df, hpi.data, by ="id")
Showing dataframe merge process

Drawing Choropleth map

Since dataframe nepal.adm3.shp.df is now updated (with extra HPI column), lets re-create the map plot variable again.

map <- ggplot(data = nepal.adm3.shp.df, aes(x = long, y = lat, group = group))

Run the following code to create polygon map, using HPI to fill the polygons.

map + 
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
coord_fixed(1.3)

Now we get good looking choropleth map, with HPI values represented by the intensity of colors.

Plotting simple choropleth map

However there are still issues, lets change the gradient so that the high value of HPI (40+) is represented by dark colors and low value (20-) by light colors.

Changing the gradient scale

Add scale_fill_gradient(..) and give high and low colors. Play with the colors to see how the colors are changed in the map.

map + 
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3)
using scale_fill_gradient

Changing Legend Title

Now change the legend title from default to something you want.

map + 
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
guides(fill=guide_colorbar(title="HP Index"))
Changing default legend label HPI to HP Index

Moving the Legend inside chart

There’s ample space inside the chart, why don’t we move the legend inside the chart.

map + 
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
guides(fill=guide_colorbar(title="HP Index")) +
theme(legend.justification=c(0,0), legend.position=c(0,0))
Legend moved inside chart area

Creating clean map

The background grid and lat-long axis are not necessary for the map. Lets remove them.

theme_bare <- theme(
axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.text=element_text(size=7),
legend.title=element_text(size=8),
panel.background = element_blank(),
panel.border = element_rect(colour = "gray", fill=NA, size=0.5)
)
map +
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
guides(fill=guide_colorbar(title="HP Index")) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
theme(legend.justification=c(0,0), legend.position=c(0,0)) +
theme_bare
Creating clean map using theme(..)

Adding labels for each district polygon

Now we need to show districts names in the map. Centroids calculation is bit tricky, but it works if you follow this code from stackoverflow.

centroids <- setNames(do.call("rbind.data.frame", by(nepal.adm3.shp.df, nepal.adm3.shp.df$group, function(x) {Polygon(x[c('long', 'lat')])@labpt})), c('long', 'lat')) centroids$label <- nepal.adm3.shp.df$id[match(rownames(centroids), nepal.adm3.shp.df$group)]map + 
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
guides(fill=guide_colorbar(title="HP Index")) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
theme(legend.justification=c(0,0), legend.position=c(0,0)) +
with(centroids, annotate(geom="text", x = long, y = lat, label=label, size=2)) +
theme_bare
Showing districts names in the center of each district polygon

Showing labels for selected districts only

The map looks cluttered. I am interested in districts which passes certain criteria. Lets show the names for districts for which HPI is greater than 40.

centroids.selected <- centroids[centroids$label %in% (hpi.data[hpi.data$HPI>40,]$id),]
m
ap +
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
guides(fill=guide_colorbar(title="HP Index")) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
theme(legend.justification=c(0,0), legend.position=c(0,0)) +
with(centroids.selected, annotate(geom="text", x = long, y = lat, label=label, size=2)) +
theme_bare
Selective districts names display

Adding title to the map

This is the easiest of all. ggtitle(..) does the job.

map + 
geom_polygon(aes(fill = HPI), color = 'gray', size = 0.1) +
ggtitle("Human Poverty Index Map") +
guides(fill=guide_colorbar(title="HP Index")) +
scale_fill_gradient(high = "#e34a33", low = "#fee8c8", guide = "colorbar") +
coord_fixed(1.3) +
theme(legend.justification=c(0,-0.1), legend.position=c(0.05,0)) +
with(centroids.selected, annotate(geom="text", x = long, y = lat, label=label, size=2)) +
theme_bare
Showing map title.

Again if you are in a hurry and you quickly scrolled down here without reading TL;DR and you just need to get the choropleth map, get the shapefiles and run the gist from here.

--

--