Accelerating Geographic Information Systems (GIS) Data Science with RAPIDS cuSpatial and GPUs

Thomson Comer
RAPIDS AI
Published in
6 min readJul 21, 2020

The economic impact of Geographic Information Systems (GIS) has doubled since 2013. Civic data, pandemic information, and autonomous vehicle information are creating vast databases in need of understanding. Our goal at NVIDIA is to bring GPU-accelerated solutions to the most often used aspects of GIS data science. For the last nine months, we’ve been working on a GIS project called RAPIDS cuSpatial to bring massive performance to these kinds of problems.

cuSpatial is an efficient C++ library accelerated on GPUs using NVIDIA CUDA and RAPIDS cuDF, an open-source GPU DataFrame library that includes Python bindings for use by the data science community. cuSpatial provides significant GPU-acceleration to common spatial and spatiotemporal operations such as point-in-polygon tests, distances between trajectories, and trajectory clustering. Speed-ups in comparison with CPU-based implementations range from 500x to 10,000x depending on the operation.

A few more important use cases of GIS processing include epidemiology and contact tracing, autonomous vehicles, civil planning, and research. During the COVID-19 pandemic, substantial new data sets are being generated based on location, position tracking, and contact lists. Producing important epidemiological information from these data sets depends upon many of the functions that cuSpatial provides, such as proximal distance clustering, trajectory grouping, geofencing, and point in polygon. Scientists and physicians are providing epidemiological GIS data to GPU researchers, producing meaningful insights with a critically faster turnaround.

Figure 1: New York State Epidemiology

Latest cuSpatial Features

Previously we reduced the clustering time of over 10,000 trajectories with the release of cuSpatial. We’ve been hard at work, accelerating new problems and refining old ones. This blog introduces the addition of cubic spline fitting to trajectories and other curves, quadtree indexing, and refinements and updates to cuSpatial Python APIs.

Spatial Distance and Inclusion Testing

cuSpatial builds on top of the RAPIDS cuDF data frame library. Put your data into cudf.DataFrame or cudf.Series to load it onto the GPU before calling cuSpatial functions. The following functions demonstrate some of the most common use cases of cuSpatial and the performance gains you can expect.

  • Point in Polygon Testing
  • Haversine Distance
  • Directed Hausdorff Distance
  • Cubic Splines for Trajectory Interpolation​
  • Quadtree Indexing

Point in Polygon Testing

With cuSpatial 0.14, performance gains of up to 10,000x over CPU processing can be found in the point_in_polygon function, which can test up to 30 billion points per second for inclusion in multiple polygons on an NVIDIA GV100 GPU. Future performance metrics apply to this hardware.

The easiest way to load polygon data for point_in_polygon is to use cuspatial.read_polygon_shapefile. Given an input shapefile containing only polygon features, the resulting tuple poly_offsets, poly_ring_offsets, poly_points perfectly matches the input requirements of point_in_polygon. See this demo notebook for more details.

cuspatial.point_in_polygon(points_x, points_y,
poly_offsets,poly_ring_offsets,
poly_points_x,poly_points_y)

The return value of point_in_polygon is a cudf.DataFrame with a column per polygon. Each column contains a Boolean mask of each input coordinate pair from points_coords representing whether or not that coordinate is inside the corresponding polygon.

Haversine Distance

cuSpatial can compute great circle distances on enormous datasets with full parallelism, which enables a straightforward calculation of haversine distance between datasets of 10 billion lat/lon coordinate pairs per second using a single GPU. This is limited only by GPU memory. Or increase throughput arbitrarily using dask-cudf.

Figure 2: Haversine distance
cuspatial.haversine_distance(
cudf.Series([0, 0, 0]), #longitude p0
cudf.Series([0, 0, 0]), #latitude p0
cudf.Series([45, 46, 47]), #longitude p1
cudf.Series([45, 46, 47]), #latitude P1
)

Directed Hausdorff Distance

Hausdorff distance is the maximum minimum distance between two spaces. It is an essential metric in path similarity, image processing, and computer graphics. Hausdorff Distance and Symmetric Segment Path-Distance (SSPD) are useful for comparing the similarity of sets of trajectories and geometries. cuSpatial’s implementation of this algorithm enables path comparisons on previously intractable set sizes.

Figure 3: Hausdorff Distance
cuspatial.directed_hausdorff_distance(
cudf.Series([0, 0, 0, 1, 1, 1, 1]), # points x
cudf.Series([1, 1, 1, 2, 2, 2, 2]), # points y
cudf.Series([3, 4]), # size of each set
)

Cubic splines for trajectory interpolation​

cuSpatial accelerates cubic spline interpolation over large sets of splines with extremely high performance on the GPU. This can be used to interpolate cubic splines based on four control point curves that pass through each start and end control point, through the two interior control points, and maintain a smooth second derivative. It is particularly useful for computing smooth vehicular trajectories for prediction, position estimation, and autonomous vehicle control.

Figure 4: Uninterpolated curve samples
Trajectory t = {{0, 3}, {1, 2}, {2, 3}, {3, 4}, {4, 3}}​
Figure 5: Interpolated curve from Figure 4 samples.

Here’s some example code that interpolates two trajectories of length 5 using cuSpatial. This code extends easily out to 1 million curves with a minimal increase in runtime. The input values soa_t and soa_x represent the source sample points in Figure 4. The final result smoothed is the smooth curve for both curves after interpolating the input values sequence_to_interpolate.

import cudf
import cupy as cp
import cuspatial
length = 5
number = 2
soa_t = cudf.Series(cp.tile(cp.arange(length), number))
soa_x = cudf.Series(cp.tile(cp.random.random(length), number))
cuspatial_splines = cuspatial.interpolate.CubicSpline(soa_t, soa_x,
size=length)
sequence_to_interpolate = cudf.Series(cp.tile(cp.linspace(
0, length, 10), number))
curve_ids = cudf.Series(cp.repeat(cp.arange(number), 10))
smoothed = cuspatial_splines(sequence_to_interpolate, curve_ids)

In the future, we aim to extend support to mixed splines of arbitrary length.

Quadtree Indexing

cuSpatial is in the process of building support for high-performance quadtree spatial indexing. cuSpatial 0.14 adds support for building the quadtree structure on position data, and future releases will use this data structure for spatial indexing and spatial refinement for large numbers of points. This will support points that may lie inside multiple polygons, and it substantially improves performance over cuSpatial’s current point_in_polygon function.

Figure 6: Dividing groups of coordinates into quadrants.

cuSpatial’s quadtree spatial indexing uses a three-phase approach:

  1. Compute all non-empty quadrants at the finest level​.
  2. Construct the quadtree by removing non-qualified nodes​.
  3. Perform polygon or coordinate indexing into the quadtree in log(n) time.

Spatial indexing with a quadtree allows point-in-polygon testing to be performed only on point/polygon pairs that fall within the same bounding boxes, providing much higher efficiency for spatial queries.

Figure 7: Quadtree-accelerated point-in-polygon

See: Efficient Quadtree Construction for Indexing Large-Scale Point Data on GPUs: Bottom-Up vs. Top-Down

Conclusion

cuSpatial accelerates spatial distance and similarity functions, indexing, and trajectory smoothing. Combined with dataframe join methods provided by cuDF, forms of spatial indexing like range and grid search are trivial. If these methods are part of your workflow, using cuSpatial will give immediate results. GPU acceleration can turn day-long runtimes into minutes and hours into seconds.

Stay tuned as cuSpatial gets faster, easier to use, and more relevant for computing solutions in complex spatial datasets. We’re already in the works on SSPD distance functions, even faster indexing, and seamless I/O compatibility with geoPandas.

Try out the notebooks and demos, and get started using cuSpatial today. If you’d like to request a new feature, please file an issue or hit us up at RAPIDS-GoAI today.

The cuSpatial Team

There are numerous engineers part of the RAPIDS community contributing to the cuSpatial project. Below are some of the top contributors. Thank you all for your hard work and dedication.

  • Thomson Comer, Sr. System Software Engineer for RAPIDS, NVIDIA
  • Dr. Jianting Zhang, Associate Professor in GIS and Computer Science, City College of New York. Visiting faculty member, NVIDIA
  • Christopher Harris, Sr. System Software Engineer for RAPIDS, NVIDIA
  • Paul Taylor, Sr. System Software Engineer for RAPIDS, NVIDIA
  • Mark Harris, Distinguished Engineer, NVIDIA
  • Keith Kraus, Sr. System Software Engineering Manager for RAPIDS, NVIDIA
  • Sujit Biswas, Principal Engineer for IVA, NVIDIA
  • Dr. Milind Naphade, CTO for IVA and AI Cities, NVIDIA

--

--