Spatial conversions: from addresses to geoJSONs in R
This is the fourth in a series of posts charting the design choices, open source tools and analytical workflows that the [Trafford Data Lab](https://twitter.com/trafforddatalab) are adopting.
The Trafford Data Lab supports decision-making in Trafford, a local authority in Greater Manchester, by revealing patterns in data through visualisation. It is committed to publishing open data and using open source tools to encourage a transparent and reproducible analytical workflow.
From addresses to geoJSON in R
Trafford Council is running one of six pilots in the EU funded OpenGovIntelligence project which aims to improve public services through Linked Open Statistical Data and co-creation. The Trafford Pilot aims to tackle worklessness within Greater Manchester by providing decision makers with relevant information in the form of interactive data visualisations.
High densities of betting shops have been associated with areas of deprivation, economic inactivity and low income (H. Wardle et al, 2014) and hence we have produced a dataset of betting shop locations within Greater Manchester for the pilot, in a format that is ready to visualise.
GeoJSON in a GitHub repo
The dataset contains a geoJSON file and it is stored in a folder in a GitHub repository. GitHub supports rendering geoJSON map files so as soon as you create your repo, a map with your geometries will be displayed. GitHub also allows customisation and embedding in HTML pages, read this article in GitHub Help for details.
Load the packages
We need to load the following packages:
library(tidyverse); library(readxl); library(ggmap); library(sf)
Load the data
The information about the location of the betting shops came from the Gambling Commission’s public register of gambling premises. The file includes information for the UK so I filtered the betting shop premises with licences granted within Greater Manchester.
I downloaded the .xlsx file and read it from a local directory with
read_xlsx() from the readxl package.
df <- read_xlsx("data/Premises-licence-database-extract.xlsx")
df <- filter(df, Local_Authority_Name %in% c("Trafford Metropolitan Borough","Bolton Metropolitan Borough Council","Bury Council","Manchester City Council","Oldham Metropolitan Borough Council","Rochdale Borough Council","Salford City Council","Stockport Metropolitan Borough Council","Tameside Metropolitan Borough Council","Wigan Metropolitan Borough Council"))
df <- filter(df, Activity=="Betting Shop", LicenceStatus=="Grant")
To create a variable with the address including the postcode, I use
unite() to merge the variables that contain the address and postcode variables with a
“ “ as a separator.
df <- unite(dfA, address, premises_Address2, premises_Postcode, sep = " ")
Select and rename the relevant variables.
df <- select(df, name = premises_Address1, address)
Geocode the addresses
With the addresses and using
geocode() from the ggmap package we get the latitude and longitude of the locations on the Earth’s surface.
locations <- df$address
geocodes <- geocode(locations)
geocode() outputs lon and lat variables with the longitude and latitude. The geocodes are then bound to the dataframe by column.
df <- cbind(df, geocodes)
There are more than 450 betting shop premises within Greater Manchester so
geocode() was returning several NA values for the lon and lat. I filtered the successfully geocoded addresses into a separate dataframe and then filtered and geocoded the addresses that still needed processing.
df1 <- na.omit(df) # already geocoded
dfToGeocode <- filter(df, is.na(X))
locations <- dfToGeocode$address
geocodes <- geocode(locations)
Before binding the geocodes by column, I remove the lon and lat variables with NA values.
df2 <- subset(dfToGeocode, select = -c(lon,lat))
df2 <- cbind(df2, geocodes)
and then join the dataframes.
bettingshops <- rbind(df1, df2)
I followed this process a few times before most of the addresses were geocoded successfully. For the addresses unsuccessfully geocoded, mainly due to incorrectly recorded addresses, I used Google maps to find the longitude and latitude. I then wrote the dataframe to a CSV file using
write_csv() to complete the codes ‘manually’.
Create the geoJSON file
With all the geocodes in place, we can use the sf library to assign a CRS (Coordinate Reference System) and write the geoJSON file.
So firstly, we convert the data frame to an sf object.
bettingshops <- st_as_sf(bettingshops, coords = c("lon", "lat"))
Then we assign a CRS, in this case I am using the WGS84 which identifier is epsg:4326.
bettingshops <- st_set_crs(bettingshops, 4326)
And finally write the geoJSON file.
The map with the betting shops locations can be found on GitHub here.
Written by Iris García Ríos, Trafford Data Lab
Wardle, H., Keily, R., Astbury, G. and Reith, G., 2014. ‘Risky places?’: Mapping gambling machine density and socio-economic deprivation. Journal of Gambling Studies, 30(1), pp. 201–212. Available online via: http://eprints.gla.ac.uk/73285/7/73285.pdf
Wickham, Hadley (2017). tidyverse: Easily Install and Load ‘Tidyverse’ Packages. R package version 1.1.1. https://CRAN.R-project.org/package=tidyverse
D. Kahle and H. Wickham. ggmap: Spatial Visualization with ggplot2. The R Journal, 5(1), 144–161. http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf
E. Pebesma. sf: Simple Features for R. https://cran.r-project.org/web/packages/sf/sf.pdf