RAPIDS 23.04 is out! We are excited about this release, which includes major updates to cuSpatial, continued improvements to graph neural network functionality in our graph libraries, some performance improvements (and benchmarking!) in cuDF, and a number of improvements and new functionality in RAFT (our accelerated primitives library), many of them targeted at improvements to vector search.
Without further ado, let’s get to the updates!
cuDF 23.04 introduces the ability to enable copy-on-write, an option that can improve performance and memory usage by reducing the number of unnecessary copies. Copy-on-write also further improves cuDF’s consistency with pandas, as pandas has optionally supported copy-on-write since version 1.5.0.
With the release of 23.04, cuSpatial continues barreling towards complete spatial relationship predicate support, and Cartesian distances between pairs of any two geometries. Here are the highlights; it’s too much to cover here so check out the upcoming blog about cuSpatial 23.04!
- GeoSeries Standardization: All cuSpatial APIs now accept GeoSeries objects instead of requiring differing inputs like specific x/y coordinates or polygon offsets.
- Linestring intersection APIs: Parallel computation of intersections between pairs of linestrings. The C++ implementation was added in 23.02, and 23.04 adds the Python API
geom_equalsPython spatial predicate: Quickly find equality of geometries in Python.
- Pairwise point-polygon distance: Take advantage of the parallelization within cuSpatial when finding the distance between points and polygons.
- Zip codes notebook: Learn which zip (postal) codes in California have the most stop signs using some blazingly-quick cuSpatial operations.
Graph continues to focus on making Graph Neural Networks (GNNs) faster and more scalable. This release includes performance improvements to Uniform Neighborhood Sampling, a key building block for many GNNs. We’ve also added better end-to-end examples of using the cugraph-pyg and cugraph-dgl plugins for the popular Pytorch Geometric and DGL GNN libraries. These new examples include demonstrations that leverage multiple trainers.
We have not forgotten traditional graph analysis and added a new multi-GPU version of induced subgraph. The first step towards multi-node multi-gpu Leiden took a step forwards with the inclusion of the C/C++ code (the Python API will be added next release). Additionally, multi-node multi-gpu vertex betweenness centrality was added in the C/C++ code, also with Python coming next release.
Continuing to support critical algorithms for vector search libraries, RAFT 23.04 now includes experimental C++ support for CAGRA, our new graph-based approximate nearest neighbor (ANN) algorithm for accelerating nearest neighbor lookup. While GPUs have traditionally excelled only when looking up neighbors for a large number of data points at a time, CAGRA is highly optimized to accelerate lookup even for a single or only a few data points at a time. You can find the new CAGRA API in
While continuing to expand support for accelerated ANN algorithms, RAFT’s Python APIs now support brute-force KNN along with the IVF-Flat ANN algorithm. As with RAFT’s other Python APIs, these accept any array object that supports
__cuda_array_interface__, including CuPy, Numba, CuDF, and PyTorch, and CuPy can be used to interop with other libraries that support only dlpack, such as JAX and Tensorflow.
We are very excited to announce our new sparse matrix API vocabulary types, which continue to provide a rich user experience like mdspan/mdarray but enable efficient computation on sparse data structures, such as compressed-sparse row (CSR) and coordinate (COO) formats right in C++. These types also decouple the matrix structure itself from the element values, enabling a flexible range of different combinations that will eventually support more optimized types like doubly-compressed CSR (DCSR) and even higher-order sparse tensor formats.
Finally, a new K-means API,
include/raft/cluster/kmeans.cuh will run several iterations of k-means over a dataset and find the best K using the Calinski-Harabasz Index, minimizing the per-cluster inertia.
We are proud of the updates that 23.04 brings, and hope that you will give it a try! RAPIDS 23.04 is available now: