Description
This article takes us through an example of plotting Canada census population data by Forward Sortation Area (FSA). A FSA is the first 3-digit of postal codes. Refer to FSA Description for further information.
We need 3 sets of info for generating these plots:
- Forward Sortation Area Shape File — Stats Canada Shapefile (2016)
- Forward Sortation Area Population Stats — Stats Canada Population Stats (2016)
- LISTING OF FORWARD SORTATION AREA CODES from Canada Post — FSA List — Canada Post
You will notice that the shapefile and population data is from 2016. The most recent data available from Stats Canada at Forward Sortation Area level is from 2016.
Import Necessary Libraries
if (!require(pacman)) install.packages(“pacman”) library(pacman) p_load(data.table, leaflet, rgdal, sp, htmltools, pdftools, tidyverse, scales)
Using the p_load
function from the pacman
library to load packages installs the libaries if they are not from CRAN repository.
The leaflet
package plots the interactive map. rgdal
is required to read the .shp file into R environment. sp
converts data.frame or data.table into a spatial polygons data frame which is required by leaflet to create the plots.
Will introduce other packages when they are called into action.
Import the ShapeFile
## Download the zip file if (!”tFile” %in% list.files("Data/"))
{
download.file(“https://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/2016/lfsa000b16a_e.zip", “Data/tFile”)
} ## Extract the .shp file if (!”lfsa000b16a_e.shp” %in% list.files("Data/"))
{
unzip(“Data/tFile”)
} ## Load Shape File into R
shpDF <- readOGR(dsn=”Data/lfsa000b16a_e.shp”)
We check if the .zip file from Stats Canada containing .shp file is already downloaded. If not, we download the file and unzip the .shp file.
Once unzipped, the .shp file is read into R environment using the readOGR
function from rgdal
package.
Load Population Data
## Import .csv file using “fread”
popDT <- fread(“Data/T1201EN.CSV”, nrows = 1642) ## We drop the frist row → this corresponds to Total Canada
## and the fourth column → this column doesn’t have any data
popDT <- popDT[-1,-4] ## Give easier to manage column names
colnames(popDT) <- c(“FSA”, “GEOName”, “Province”, “Population”, “Dwellings”, “DwellingsOccupied”)
The population data (“T1201EN.CSV”) is sourced from Statistics Canada, Catalogue no. 98–402-X2016001.
Import FSA Names from Canada Post PDF
## Download the pdf file
if (!”fsa-list-april-2022.pdf” %in% list.files("Data/"))
{
download.file(“https://www.canadapost-postescanada.ca/cpc/doc/en/support/kb/nps/fsa-list-april-2022.pdf", “Data/fsa-list-april-2022.pdf”)
} ## Read the PDF file
pdfPageList <- pdf_text(“Data/fsa-list-april-2022.pdf”) %>% str_split(“\n”) ## The first page doesn’t contain any FSAs so we skip it
## The top 11 lines are headers
## The last two lines are footers
for(i in 2:15)
{
pdfPageList[[i]] <- pdfPageList[[i]][11:(length(pdfPageList[[i]]) — 2)]
}
rm(i) ## Convert the List to Table
myFunc <- function(x)
{
## This function converts list to data.table
return(data.table(x))
} pdfPageTable <- do.call(“rbind”,lapply(pdfPageList[2:15], myFunc)) rm(pdfPageList) ## Split text into individual columns
### If there are 4 consecutive white space, we consider that as a separator
pdfPageTable[,paste0(“c”, seq(1, 4)):=tstrsplit(x, “\\s{4, }”)] ### Drop the original column
pdfPageTable <- pdfPageTable[,!”x”,with=F] ## Extract FSA codes and FSA Names
myFunc2 <- function(x, DT)
{
# For each column passed to this function
# First 3 characters are considered as FSA code
# The text starting from 4 character onwards is considered FSA Name
# The FSA codes and Names are stored as a data.table and returned DT2 <- DT[,x,with=F]
DT2 <- DT2[!is.na(get(x)) & !(get(x) == “SUCC BUREAU-CHEF”) & !get(x)==””,]
DT2$FSA <- substr(trimws(DT2[[x]]), 1, 3)
DT2$FSAName <- trimws(substr(trimws(DT2[[x]]), 4, nchar(DT2[[x]])))
DT2 <- DT2[,!x,with=F] return(DT2)
} modTable <- do.call(“rbind”, lapply(colnames(pdfPageTable), myFunc2, DT = pdfPageTable))
rm(pdfPageTable)
A person reading the map should be able to meaningfully identify a Forward Sortation Area on the map. The 3 digit FSA codes in the shapefile and the population data file don’t help with this.
Fortunately, Canada Post has supplied names to each 3 digit FSA code in the “fsa-list-april-2022.pdf”. The tables are read into R using the pdftools
package. As the columns are not aligned, some additionally processing is required to get the FSA codes and names in a tabular format.
Merge Population Data with Shape File and Canada Post FSA names
## Merge the Shape File with Population Data
shpPopDT <- merge(
x = shpDF[shpDF$PRNAME == “Ontario”,],
y = popDT,
by.x = “CFSAUID”,
by.y = “FSA”,
all.x = T
) ## Delete the shpDF to reduce memory space
rm(shpDF, popDT)
gc()## Merge the FSA Names
shpPopDT <- merge(
x = shpPopDT,
y = modTable,
by.x = "CFSAUID",
by.y = "FSA",
all.x = T
)
rm(modTable) ## Convert the Merged File into a Shape File format
shpPopDT <- spTransform(shpPopDT, CRS("+init=epsg:4326")) ## Store a copy as a data.table to create labels
popDT <- data.table(shpPopDT@data)
popDT$Population <- comma(popDT$Population)
popDT$Dwellings <- comma(popDT$Dwellings)
We bring all the information together by merging the datasets.
Points to note:
- We are keeping data for “Ontario” province only. This is done to reduce the file size while publishing the document to Quarto. However, the shiny app (link towards the end of this article) has all provinces.
- The merged data frame has to be converted to spatial polygons data frame for leaflet to be able to plot the map. This is achieved using the
spTransform
function from thesp
package.
Plot an Interactive Map Using Leaflet
## labels
labs <- lapply(seq(nrow(popDT)), function(i)
{
paste0( ‘<p><b>’, popDT[i, “FSAName”], ‘</b></p><p>’, “<i>Population: </i>”, popDT[i, “Population”],’</p><p>’, “<i>Dwellings: </i>”, popDT[i, “Dwellings”], ‘</p>’ )
}) ## Palette
pal <- colorNumeric(
palette = “Reds”,
domain = shpPopDT$Population) ## Plotting the map leaflet(shpPopDT) %>%
addTiles() %>%
setView(-84.81, 49.74, zoom = 5) %>%
addPolygons( data = shpPopDT,
fillColor = ~pal(Population), weight = 2, opacity = 1,
color = “white”, dashArray = “3”, fillOpacity = 0.7,
highlightOptions = highlightOptions( weight = 5,
color = “#666”, dashArray = “”, fillOpacity = 0.7,
bringToFront = TRUE), label = lapply(labs, HTML),
labelOptions = labelOptions(
style = list(“font-weight” = “normal”,
padding = “3px 8px”),
textsize = “15px”, direction = “auto”) ) %>%
addLegend(pal = pal, values = ~Population,
opacity = 0.7, title = “Population”, position = “bottomleft”)
The labels are formatted using html codes and are rendered using the HTML
function from the htmltools
package.
I have used “Red” color palette. There are other palettes available to use. Refer to https://www.nceas.ucsb.edu/sites/default/files/2020-04/colorPaletteCheatsheet.pdf for available options.
With few additions to the above, you can also create a Shiny App — https://ajaypratham.shinyapps.io/interactivefsageomapping.
Data files and associated codes, including the one to create Shiny App, can be accessed on Github — https://github.com/ajayrao1983/CanadaGeoMapping.
Hope you found this article helpful and informative. Please leave your thoughts, comments, feedbacks.