Query cuSpatial about your (Spatial) Relationship

Michael Wang
RAPIDS AI
Published in
5 min readDec 16, 2022

Authors: Thomson Comer, Michael Wang and Mark Harris

cuSpatial 22.12 focused on progress toward four primary goals: distance queries, spatial relationship queries, linestring intersection, and low-level C++ refactoring and iteration support.

Spatial Relationship Queries

cuSpatial 22.12 takes steps towards rich support of spatial relationship queries between geometry features. This release includes the new GeoSeries.contains_properly() method for polygons, which checks whether each polygon in the GeoSeries strictly contains the corresponding feature in another GeoSeries, as Figure 1 demonstrates.

Figure 1: contains_properly() is stricter than contains(), as it excludes the boundary of shapes.

Polygon A properly contains point B if and only if B intersects the interior of A and not the boundary of A.

>>> from shapely.geometry import Polygon, Point
>>> import cuspatial
>>> polygons = cuspatial.GeoSeries([
Polygon(
[[-0.5, -0.5], [-0.5, 0.5], [0.5, 0.5], [0.5, -0.5], [-0.5, -0.5]]
),
Polygon(
[[-0.5, -0.5], [-0.5, 0.5], [0.5, 0.5], [0.5, -0.5], [-0.5, -0.5]]
),
Polygon(
[[-0.5, -0.5], [-0.5, 0.5], [0.5, 0.5], [0.5, -0.5], [-0.5, -0.5]]
),
])
>>> points = cuspatial.GeoSeries([Point(-1, -1), Point(0, 0), Point(1, 1)])
>>> print(polygons.contains_properly(points))
>>> print(points.within(polygons))
0 False
1 True
2 False
dtype: bool
0 False
1 True
2 False
dtype: bool

The next releases of cuSpatial will expand spatial relationship queries with the goal of supporting all spatial relationships in the standard DE-9IM model.

Enhanced Support for Distance Queries

cuSpatial 22.12 expands support for distance queries, including the new pairwise_point_distance function for computing the distance between pairs of points or the nearest distance between multipoints.

>>> from shapely.geometry import *
>>> import cuspatial
>>> s1 = cuspatial.GeoSeries([
Point(2.0, 2.0),
Point(0.0, 0.0),
Point(3.0, 1.5)])
>>> s2 = cuspatial.GeoSeries([
Point(-1.0, -0.5),
Point(-4.1, 1.8),
Point(0.0, 0.0)])
>>> cuspatial.pairwise_point_distance(s1, s2)
0 3.905125
1 4.477723
2 3.354102
dtype: float64

The below shows the comparison between throughput of this function versus geopandas.geoseries.distance function.

Figure 2. Throughput comparison between cuSpatial and GeoPandas for point-point distance.

Note the axes in Figure 2 use log scales. When input size is small, cuSpatial is constrained by kernel launch overhead and exhibits lower throughput than the CPU-side solution. GeoPandas saturates the CPU and throughput peaks at about 1000 points,, while cuSpatial throughput continues to scale. The speedup measured for 1e6 points is 140x, and we expect the speedup to increase with more points.

The distance functions pairwise_point_distance, pairwise_linestring_distance and pairwise_point_linestring_distance now all accept GeoSeries as input and support computing euclidean distance between pairs of multigeometries (e.g. multilinestrings and multipoints).

>>> s3 = cuspatial.GeoSeries([
LineString([Point(0.0, 0.0), Point(1.0, 1.0)]),
LineString([Point(-1.0, -1.5), Point(-2.0, 0.0)])])
>>> s4 = cuspatial.GeoSeries([p
LineString([Point(0.0, 1.0), Point(1.0, 0.0)]),
MultiLineString([
LineString([(0.5, 0.5), (1.0, 1.0)]),
LineString([(0.8, 0.0), (1.3, 0.75)])])])
>>> cuspatial.pairwise_linestring_distance(s3, s4)
0 0.000000
1 2.329741
dtype: float64

Improved GeoSeries operations

Any subset of features in a cuSpatial GeoSeries can now be sliced and rearranged before calling internal geometry functions. The new slicing ensures that the underlying data maintains the GeoArrow structure for subsequent queries. .loc gains the same functionality but accepts arbitrary index types. GeoSeries also gains the ability to contain null rows as well as index alignment for binary and other operations.

>>> mixed_geoseries = cuspatial.GeoSeries([
Polygon(
[[-0.5, -0.5], [-0.5, 0.5], [0.5, 0.5], [0.5, -0.5], [-0.5, -0.5]]
),
Polygon(
[[-0.5, -0.5], [-0.5, 0.5], [0.5, 0.5], [0.5, -0.5], [-0.5, -0.5]]
),
Polygon(
[[-0.5, -0.5], [-0.5, 0.5], [0.5, 0.5], [0.5, -0.5], [-0.5, -0.5]]
),
Point(-1, -1), Point(0, 0), Point(1, 1)
])
>>> print(
mixed_geoseries[0:3].contains_properly(mixed_geoseries[3:6], align=False)
)
0 False
1 True
2 False
dtype: bool

Abstraction of Geometries as C++20 Ranges

In C++20, the range concept abstracts iterables with a .begin() and .end() interface. While cuSpatial currently only requires C++17, “Range-like” objects are being introduced to abstract geometry arrays for much easier access in device kernels.

Recall that as of cuSpatial 22.10 cuSpatial implements the GeoArrow specification. Any geometry objects conforming to the GeoArrow specification can utilize these range objects in GPU kernels.

Range objects are analogous to std::span and the device_span in libcudf, with the exception that range objects are not limited to memory-backed iterators. A range can also be constructed from a “fancy iterator” such as thrust::counting_iterator, for example. The below gives a simple example to show a range object’s versatility.

In GeoArrow, a 2d linestring object is of type List<FixedSizeList<double>[2]>, and a multilinestring is List<List<FixedSizeList<double>[2]>>, with an additional level of offset array. With a range object, one can represent a 2d linestring object with a multilinestring object by setting the top level offset range with a range of counting iterators. This has many advantages. First, since a range of counting iterators is lazily evaluated, it does not cost extra storage. Second, allowing multilinestring to represent linestring reduces the API surface area of cuSpatial acceleration kernels by half.

    // Making a multilinestring_range object from multilinestring buffers
auto multilinestring = cuspatial::multilinestring_range(
geometry_offset.begin(),
geometry_offset.end(),
part_offset.begin(),
part_offset.end(),
coordinates.begin(),
coordinates.end()
);

// Making a multilinestring_range object from linestring buffers
auto geom_begin = thrust::counting_iterator(0);
auto linestring = cuspatial::multilinestring_range(
geom_begin,
geom_begin + num_geoms + 1,
geometry_offset.end(),
part_offset.begin(),
part_offset.end(),
coordinates.begin(),
coordinates.end()
);

With range objects, accessing geometry objects in the device kernel is much easier. The example below shows a kernel finding the closest point between a point and all other points in the same multipoint.

template <typename MultiPointRange, typename OutputIt>
void __global__ find_closest_point_in_multipoint(MultiPointRange multipoint,
OutputIt output) {
for (auto pidx = threadIdx.x + blockIdx.x * blockDim.x;
pidx < multipoints.num_points(); pidx += gridDim.x * blockDim.x) {
// part_idx_from_point_idx traverses upward in the hierarchy for parent node Id
auto part_idx = multipoints.part_idx_from_point_idx(pidx);
// supports array-access operator
auto point = multipoints[part_idx][point_idx];
cuspatial::vec_2d<float> nearest;
float nearest_dist = std::numeric_limits<float>::max();
// supports iterating the multi-geometry with range-based for loop
for (auto other_point : multipoints[part_idx]) {
if (other_point == point) continue;
if (float dist = distance(point, other_point); dist < nearest_dist) {
nearest_dist = dist;
nearest = other_point;
}
}
output[pidx] = nearest;
}
}

Introduction of range-like objects has just begun in libcuspatial, with support for multipoint and multilinestring, while multipolygon range support is planned for the near future.

Other improvements and fixes

cuSpatial 22.12 includes many other improvements, including header-only refactoring off trajectory APIs; low-level primitives in support of the upcoming linestring intersection API; and miscellaneous improvements and bug fixes. In total, we closed 22 issues with this release.

Try cuSpatial Today

Our passion is empowering data scientists to do their best work in the most fluid and fast way possible. Please try out these improvements to cuSpatial and tell us what you think! Are you a cuSpatial user, and have a cool feature in mind? File a GitHub issue and track it via our now public project board! Or you might be a GIS enthusiast who would also like to write state of the art accelerated algorithms? Contribute to cuSpatial by following the developer guide! Chat with us on Twitter @rapidsai. And if you create something truly great, please tell us.

--

--

Michael Wang
RAPIDS AI

Michael Yh Wang is a software engineer in Nvidia Rapids. He currently contributes his engineering skills towards cuDF, cuSpatial and Numba.