H3 Spatial Grid Support in Apache Sedona

Mo Sarwat
4 min readOct 16, 2023

--

Uber H3 is a technique to model and index and geospatial data. H3 grids the earth surface by casting it on a icosahedron, and tiles it by hexagons + pentagons. Apache Sedona is a spatial computing engine that enables developers to easily process spatial data at any scale within modern cluster computing systems such as Apache Spark and Apache Flink. Sedona developers can express their spatial data processing tasks in Spatial SQL, Spatial Python or Spatial R. Internally, Sedona provides spatial data loading, indexing, partitioning, and query processing/optimization functionality that enable users to efficiently analyze spatial data at any scale. In this article, we will explain how developers can use H3 in Apache Sedona.

First, Why H3 for Spatial Data?

H3 is one of many ways to model and index spatial data. However, H3 has many advantages compared to other grid systems, but the following are very important:
1. Neighbor Traversal: H3 implemented many high efficient algorithms like find neighbors within a specific distance
2. Find the shortest path from cell A to cell b: All this is useful for approximate statistics like hotspot calculation, finding nearest neighbors.

Actually, companies like Uber uses H3 mainly for ride planning. Even though bare metal H3 does not reflect accurate geometric relationship between shapes, but the approximation is good enough for the Uber ride planning use case. That is because hexagons can reflect the distance between cells since the distance between their centroids are equal.

Another benefit is that the spatial distortion is largely minimized. The geometry shape of H3 Hexagonal cells are similar while the shape of rectangles change drastically with the coordinates due to the nature of the Mercator projection. This makes H3 more suitable for visualization. This is the other use case for H3 in Uber.

Comprehensive support for Uber H3 in Apache Sedona

Apache Sedona 1.5.0 now provides the following functions related to Uber H3:

  • ST_H3CellIDs(geom: geometry, level: Int, fullCover: true)
  • ST_H3CellDistance(cell1: Long, cell2: Long)
  • ST_H3KRing(cell: Long, k: Int, exactRing: Boolean)
  • ST_H3ToGeom(cells: Array[Long])

Let’s use the Seattle road network dataset (from OSM) as an example.

Seattle Road Network

Create H3 cell IDs for geometries

You can create H3 cell ids using ST_H3CellIds as follows.

roadDf = sedona.read.format("csv").option("header", "true").load(PATH_PREFIX + "data/OSM2015_roads.csv")
roadDf = roadDf.selectExpr(
"monotonically_increasing_id() as id",
"ST_GeomFromWkt(geometry) as road", "`attr#1` as attr",
"ST_H3ToGeom(ST_H3CellIDs(ST_GeomFromWkt(geometry), 10, false)) as roadH3Geom",
"ST_H3CellIDs(ST_GeomFromWkt(geometry), 10, false) as roadH3"
)
roadDf.show(10)

+---+--------------------+--------------------+--------------------+--------------------+
| id| road| attr| roadH3Geom| roadH3|
+---+--------------------+--------------------+--------------------+--------------------+
| 0|LINESTRING (-122....|[tiger:county#Kin...|MULTIPOLYGON (((-...|[6222150966219243...|
| 1|LINESTRING (-122....| [service#alley|MULTIPOLYGON (((-...|[6222150966281830...|
| 2|LINESTRING (-122....| [surface#asphalt|MULTIPOLYGON (((-...|[6222150966517759...|
| 3|LINESTRING (-122....|[tiger:county#Kin...|MULTIPOLYGON (((-...|[6222150966243164...|
| 4|LINESTRING (-122....|[tiger:county#Kin...|MULTIPOLYGON (((-...|[6222150966286090...|
| 5|LINESTRING (-122....|[tiger:county#Kin...|MULTIPOLYGON (((-...|[6222150966439772...|
| 6|LINESTRING (-122....|[tiger:county#Kin...|MULTIPOLYGON (((-...|[6222150966722559...|
| 7|LINESTRING (-122....| [highway#service]|MULTIPOLYGON (((-...|[622215096642338815]|
| 8|LINESTRING (-122....|[tiger:county#Kin...|MULTIPOLYGON (((-...|[6222150966830039...|
| 9|LINESTRING (-122....| [service#alley|MULTIPOLYGON (((-...|[6222150959224913...|
+---+--------------------+--------------------+--------------------+--------------------+

Visualize H3 cells using GeoPandas Plot

You can use ST_H3ToGeom to generate the boundary of an H3 cell given its ID. To demonstrate, we will plot a subset of these cells using GeoPandas. The resulting plot is shown below:

pandasDf = roadDf.limit(100).toPandas()
roadGpd = gpd.GeoDataFrame(pandasDf, geometry="road")
h3Gpd = gpd.GeoDataFrame(pandasDf, geometry="roadH3Geom")
ax = h3Gpd.plot(edgecolor='gray', linewidth=1.0)
ax.set_xlabel('Longitude (degrees)')
ax.set_ylabel('Latitude (degrees)')
roadGpd.plot(ax=ax, edgecolor='black', linewidth=1.0)

Join geometries by H3

The example above demonstrates how to join two datasets together using their H3 cell IDs. Even more impressively, you can create a ring buffer around the original geometry using ST_H3KRing and find matches using the rings. The following query returns roads located within 10 cells of ST_POINT(-122.390, 47.54717658413222):

ringCells = pointDf.selectExpr(f"ST_H3KRing(ST_H3CellIDs(ST_POINT(-122.390, 47.54717658413222), 10, false)[0], 10, true) as cells")
# do join
ringCells.select(explode("cells").alias("cell")).alias("ring").join(
roadDf.select("id", "road", explode("roadH3").alias("cell"), "roadH3Geom").alias("roads"),
expr("ring.cell = roads.cell")
).dropDuplicates(["id"])

We can visualize the ring of the cell created by ST_H3KRing(ST_H3CellIDs(ST_POINT(-122.390, 47.54717658413222), 10, false)[0], 10, false) as follows:

Alternatively, we can fill in all cells in the ring like this ST_H3KRing(ST_H3CellIDs(ST_POINT(-122.390, 47.54717658413222), 10, false)[0], 10, false) :

Conclusion:

H3 can be used to index spatial data by partitioning the earth surface into hexagons. Apache Sedona is large-scale spatial data processing engine. Sedona 1.5.0 now supports H3 — Developers can create H3 grids, attach H3 grids to geometries, and perfrom spatial operations on geometries using H3 in Apache Sedona. For more details, visit:

Apache Sedona Github repo: https://github.com/apache/sedona

Apache Sedona Website: https://sedona.apache.org/1.5.0/

H3 docs in Apache Sedona: https://sedona.apache.org/1.5.0/api/sql/Function/#st_h3celldistance

Apache Sedona 1.5.0 Release Notes: https://sedona.apache.org/1.5.0/setup/release-notes/

--

--

Mo Sarwat

Mo is the founder of Wherobots.ai, CS Prof at Arizona State University, & the creator of Apache Sedona (a scalable system for processing big geospatial data)