Spatial Data Analysis With Hexagonal Grids

sid dhuri
The Startup
Published in
4 min readJul 14, 2020

--

For analyzing large spatial data sets, we need to partition geographic areas into identifiable grid cells. For example, to divide a large region into smaller units for indexing purposes or slice the geographic area into subunits over which we want to summarize a spatial variable.

Such grids are usually comprised of either equilateral triangles, squares, or hexagons, as these three polygon shapes are the only ones that can tessellate i.e. cover an area by repeated use of a single shape, without gaps or overlapping

3 type of grids commonly used for mapping geographical data

The square grid such as the Mercator projection is the most commonly used shape in spatial analysis and thematic mapping. However, a square grid can introduce some distortions such as Greenland appears the same size as Africa, when in reality Africa’s area is 14 times greater.

Mercator projection introduces distortions due to square grid

What we need are polygons of equal size, regardless of where they are on the globe, regardless of their resolution.

Regular hexagons are the closest shape to a circle and can be effectively used for the regular tessellation of a geographic area.

For businesses such as ride sharing apps that rely on accurate mapping of geographical areas for their services, it is critical to choose a grid map that minimizes distortions and quantization error introduced when users move through a city.

Uber’s hexagonal grid

Uber’s business relies heavily on accurately mapping geographic areas to offer their services. Users request rides, locate nearest drivers, and deliver food to people staying at home, among other.

Also Uber’s future plans for air travel such as Uber Elevate, Uber Air and Uber Copter, that would take riders over long distances will need accurate pricing over long distances in real time.

Uber uses spatial analysis to better understand and optimize the marketplace for users. For example, to identify areas with high demand than supply and adjust pricing in response or identify users in close proximity who have requested UberPool service.

Such analysis at the finest granularity, the exact location where an event happens, is very difficult and expensive. Analysis on areas, such as neighborhoods within a city, is much more practical.

Hexagonal grid to bin event into clusters

Uber uses a grid system to bin events into hexagonal cells. Data points within a cell are binned. For example to determine surge pricing based on demand in a given cell.

There are some benefits of using hexagonal cells:

  • Hexagons reduce sampling bias due to edge effects of the grid shape, as hexagons are closest to a circular-shaped polygon that can tessellate to form an evenly spaced grid.
  • Hexagons are preferable when your analysis includes aspects of connectivity or movement paths.
  • Finding neighbors is more straightforward with a hexagon grid. Since the edge or length of contact is the same on each side, the centroid of each neighbor is equidistant.
  • Since the distance between centroids is the same in all six directions with hexagons, you will have more neighbors included in the calculations for each feature if you are using hexagonal grid as opposed to a fishnet grid.
Distances from a cell to its neighbors is equal in all directions in a hexagonal grid

Uber has opensourced it’s hexagon mapping library H3 which itself is written in C, but has bindings to javascript. You can find the H3 library on github

Let’s see how we can use hexagonal mapping in R programming

Overlaying hexagonal grid on Great Britain map

following is a sample code to create a hexagonal grid for a geographical area.

# Load required packageslibrary(dplyr)
library(tidyr)
library(sp)
library(raster)
library(rgeos)
library(rgbif)
library(viridis)
library(gridExtra)
library(rasterVis)
library(rgdal)
#' load geographical data from GDAM for Great Britain
study_area <- getData("GADM", country = "GB", level = 0, path = tempdir(), )
study_area <- study_area %>%
disaggregate %>%
geometry
study_area <- sapply(study_area@polygons, slot, "area") %>%
{which(. == max(.))} %>%
study_area[.]
plot(study_area, col = "grey50", bg = "light blue", axes = TRUE, cex = 20)
text(81.5, 9.5, "Study Area:\nGB")
hex_points <- spsample(study_area, type = "hexagonal", cellsize = 0.5)
hex_grid <- HexPoints2SpatialPolygons(hex_points, dx = 0.5)
plot(study_area, col = "grey50", bg = "light blue", axes = TRUE)
plot(hex_points, col = "black", pch = 20, cex = 0.5, add = T)
plot(hex_grid, border = "orange", add = T)

--

--

sid dhuri
The Startup

I am data scientist by trade. I love to write about data science, marketing and economics. I founded Orox.ai a marketing ai, analytics and automation platform.