cuSpatial Makes Progress Toward a Full-Featured Spatial Analytics Library

Michael Wang
RAPIDS AI
Published in
5 min readDec 1, 2022

by Michael Wang, Thomson Comer, Mark Harris, Ben Jarmak

Geographic Information Systems and mapping are problem spaces that include some of the world’s most important and exciting technologies. Spatial analytics has powered big breakthroughs, and like the rest of RAPIDS, the team built cuSpatial to accelerate the work of practitioners.

Development of the cuSpatial geospatial data analytics library has accelerated in the past few RAPIDS releases. We have added a lot to release 22.10 and we are excited to share it with you. Improvements include new and faster APIs, progress towards implementing some of the ST_* primitives familiar to the GIS community; fully rewritten documentation for users and developers; and a more organized roadmap, which is now visible to all.

Better Documentation

cuSpatial’s documentation has been significantly improved in the 22.10 release. The Python documentation has received a major rewrite, with a number of new sections including a new Python User Guide that provides examples of using all cuSpatial functions. The API Reference section now organizes all python APIs by functional categories. The new Python Developer Guide provides detailed coverage of library design, how to set up a development environment, and how to contribute to cuSpatial. There is also a new C++ Developer Guide which covers design and developer guidelines for the cuSpatial C++ and CUDA layer.

Loads of New Features and More User Friendly

GeoSeries Refactoring and GeoArrow implementation

Significant refactoring of the GeoSeries class makes its internals more streamlined and robust. GeoSeries fully models Apache Arrow’s union array format, providing support for structure of array data storage for mixed geometry types, while providing efficient access for GPU algorithms.

For each of the sub-geometry types in the union array, cuSpatial now stores them in GeoArrow format. This is an extension data type based on Apache Arrow’s variable-size list layout. As of writing, cuSpatial is the first implementation of GeoArrow on GPU.

GeoSeries is also used as the common class used for data I/O and computation APIs. This significantly improves the cuSpatial user experience. In older APIs, users needed to manually specify offset arrays to divide the input geometries into individual spaces, for example:

def pairwise_linestring_distance(offsets1, xs1, ys1, offsets2, xs2, ys2)

GeoSeries implicitly carries the offset information inside the object, so now users only need to specify the input as GeoSeries objects in newer APIs:

def pairwise_point_linestring_distance(
points: GeoSeries, linestrings: GeoSeries
):

Seamless interoperability with GeoPandas

Importing and exporting data to and from cuSpatial is now easier and faster. Specifically, users can call cuspatial.from_geopandas to convert data from GeoPandas and use GeoSeries.to_geopandas to convert data back. This round trip enables users to pass performance critical sections in their workflow to the GPU, without manual handling of data formats, as Figure 1 shows.

Figure 1. cuSpatial / GeoPandas Workflow Acceleration Model

Faster File I/O

cuSpatial 22.10 improves cuspatial.read_polygon_shapefile by returning a GeoSeries object. Users who were previously using GeoPandas to load shapefiles can skip the step and use read_polygon_shapefile for fast I/O directly to the GPU. If you need direct file I/O for other data types, please file a cuSpatial feature request issue.

ST_* Routines Support

Section 7.2 of Simple feature access defines a set of routines fully supported by the SQL language, many of which should be familiar to GIS users: ST_Distance, ST_Intersects, ST_Difference, etc. Recent releases of cuSpatial have moved towards supporting these operations on GPU devices.

ST_Distance

cuSpatial now provides partial support for shortest euclidean distance between geometries, analogous to ST_Distance support in PostGIS. Figure 2 shows the supported geometry combinations as of the 22.10 release.

Figure 2: Feature matrix for ST_Distance

Besides supporting accelerated distance computation between “pure” geometry arrays, supporting inputs with mixed geometry series is also on cuSpatial’s roadmap.

Miscellaneous

cuSpatial 22.10 introduces a function to accelerate computation of nearest points between pairs of points and linestrings, analogous to shapely.ops.nearest_points, but parallelized on the GPU over many rows. A new memory_usage API has been added to compute the GPU memory consumed by a GeoSeries. GeoSeries slicing is also better supported, accepting non-contiguous indices as input. Overall, cuSpatial APIs receive a significant update and we can’t include them all. Interested readers should visit the new user guide and explore the library themselves.

Improved C++ Developer Experience

Header only C++ API

C++ developers may be interested to learn that libcuspatial is undergoing an internal refactoring to provide a cuDF-independent header-only API that contains all algorithm implementations in cuSpatial. This header-only API is a generic iterator-based interface that improves modularity and flexibility, with a style inspired by STL and Thrust. This separates the accelerated algorithms implementations from details of the data container, allowing developers to apply the algorithms to the container types of their choice. C++ Developers may now compile their library to call cuSpatial functions without the need to depend on libcudf, reducing the footprint for developers setting up their environment.

The existing libcudf column-based C++ API is now a layer on top of the header-only API, as Figure 2 shows. The refactoring is about 60% complete and will be finished sometime in 2023.

Figure 3: The cuSpatial Software stack showing the new header-only C++ API

Try cuSpatial Today!

Our passion is empowering data scientists to do their best work in the most fluid and fast way possible. Please try out these improvements to cuSpatial and tell us what you think! Are you a cuSpatial user, and have a cool feature in mind? File a GitHub issue and track it via our now public project board! Or you might be a GIS enthusiast who would also like to write state of the art accelerated algorithms? Contribute to cuSpatial by following the developer guide! Chat with us on Twitter @rapidsai. And if you create something truly great, please tell us.

--

--

Michael Wang
RAPIDS AI

Michael Yh Wang is a software engineer in Nvidia Rapids. He currently contributes his engineering skills towards cuDF, cuSpatial and Numba.