Spatial conversions: from addresses to geoJSONs in R

This is the fourth in a series of posts charting the design choices, open source tools and analytical workflows that the [Trafford Data Lab](https://twitter.com/trafforddatalab) are adopting.

The Trafford Data Lab supports decision-making in Trafford, a local authority in Greater Manchester, by revealing patterns in data through visualisation. It is committed to publishing open data and using open source tools to encourage a transparent and reproducible analytical workflow.


From addresses to geoJSON in R

Trafford Council is running one of six pilots in the EU funded OpenGovIntelligence project which aims to improve public services through Linked Open Statistical Data and co-creation. The Trafford Pilot aims to tackle worklessness within Greater Manchester by providing decision makers with relevant information in the form of interactive data visualisations.

High densities of betting shops have been associated with areas of deprivation, economic inactivity and low income (H. Wardle et al, 2014) and hence we have produced a dataset of betting shop locations within Greater Manchester for the pilot, in a format that is ready to visualise.

Using R, I will explain how I started from a list of addresses to a geoJSON file with the ggmap and sf R packages.

GeoJSON in a GitHub repo

The dataset contains a geoJSON file and it is stored in a folder in a GitHub repository. GitHub supports rendering geoJSON map files so as soon as you create your repo, a map with your geometries will be displayed. GitHub also allows customisation and embedding in HTML pages, read this article in GitHub Help for details.

Load the packages

We need to load the following packages:

library(tidyverse); library(readxl); library(ggmap); library(sf)

Load the data

The information about the location of the betting shops came from the Gambling Commission’s public register of gambling premises. The file includes information for the UK so I filtered the betting shop premises with licences granted within Greater Manchester.

I downloaded the .xlsx file and read it from a local directory with read_xlsx() from the readxl package.

df <- read_xlsx("data/Premises-licence-database-extract.xlsx")
df <- filter(df, Local_Authority_Name %in% c("Trafford Metropolitan Borough","Bolton Metropolitan Borough Council","Bury Council","Manchester City Council","Oldham Metropolitan Borough Council","Rochdale Borough Council","Salford City Council","Stockport Metropolitan Borough Council","Tameside Metropolitan Borough Council","Wigan Metropolitan Borough Council"))
df <- filter(df, Activity=="Betting Shop", LicenceStatus=="Grant")

To create a variable with the address including the postcode, I use unite() to merge the variables that contain the address and postcode variables with a “ “ as a separator.

df <- unite(dfA, address, premises_Address2, premises_Postcode, sep = " ")

Select and rename the relevant variables.

df <- select(df, name = premises_Address1, address)

Geocode the addresses

With the addresses and using geocode() from the ggmap package we get the latitude and longitude of the locations on the Earth’s surface.

locations <- df$address
geocodes <- geocode(locations)

geocode() outputs lon and lat variables with the longitude and latitude. The geocodes are then bound to the dataframe by column.

df <- cbind(df, geocodes)

There are more than 450 betting shop premises within Greater Manchester so geocode() was returning several NA values for the lon and lat. I filtered the successfully geocoded addresses into a separate dataframe and then filtered and geocoded the addresses that still needed processing.

df1 <- na.omit(df) # already geocoded
dfToGeocode <- filter(df, is.na(X))
locations <- dfToGeocode$address
geocodes <- geocode(locations)

Before binding the geocodes by column, I remove the lon and lat variables with NA values.

df2 <- subset(dfToGeocode, select = -c(lon,lat))
df2 <- cbind(df2, geocodes)

and then join the dataframes.

bettingshops <- rbind(df1, df2)

I followed this process a few times before most of the addresses were geocoded successfully. For the addresses unsuccessfully geocoded, mainly due to incorrectly recorded addresses, I used Google maps to find the longitude and latitude. I then wrote the dataframe to a CSV file using write_csv() to complete the codes ‘manually’.

write_csv(bettingshops, "bettingshops_gm.csv")

Create the geoJSON file

With all the geocodes in place, we can use the sf library to assign a CRS (Coordinate Reference System) and write the geoJSON file.

So firstly, we convert the data frame to an sf object.

bettingshops <- st_as_sf(bettingshops, coords = c("lon", "lat"))

Then we assign a CRS, in this case I am using the WGS84 which identifier is epsg:4326.

bettingshops <- st_set_crs(bettingshops, 4326)

And finally write the geoJSON file.

st_write(bettingshops, "bettingshops_gm.geojson")
The betting shops visualised on GitHub

The map with the betting shops locations can be found on GitHub here.

In a future post we will present the Shiny app we are developing for the OpenGovIntelligence project.


Written by Iris García Ríos, Trafford Data Lab


References

Wardle, H., Keily, R., Astbury, G. and Reith, G., 2014. ‘Risky places?’: Mapping gambling machine density and socio-economic deprivation. Journal of Gambling Studies, 30(1), pp. 201–212. Available online via: http://eprints.gla.ac.uk/73285/7/73285.pdf

Wickham, Hadley (2017). tidyverse: Easily Install and Load ‘Tidyverse’ Packages. R package version 1.1.1. https://CRAN.R-project.org/package=tidyverse

D. Kahle and H. Wickham. ggmap: Spatial Visualization with ggplot2. The R Journal, 5(1), 144–161. http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf

E. Pebesma. sf: Simple Features for R. https://cran.r-project.org/web/packages/sf/sf.pdf

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.