RAPIDS Release 23.02

Paul Mahler
RAPIDS AI
Published in
6 min readMar 9, 2023

First Release of the Year

RAPIDS Release 23.02 is live. We are most excited to highlight several quality-of-life improvements. We have a new web presence for RAPIDS; we believe this will make learning, using, and adding to RAPIDS easier and more enjoyable than ever before. We added comprehensive installation instructions and improved user guides. There is a new guide on all the ways RAPIDS makes it possible to visualize your data. UDFs have been greatly enhanced in cuDF, and cuDF offers improved memory interoperability with PyTorch. Some cuML models now have the ability to fall back to CPU implementations, a step forward for everyone who switches between CPU and GPU systems. Below we outline what has been improved for each individual package.

cuDF

We have changed how UDFs used in groupby-apply are compiled. The table below shows execution times of a UDF (Student’s T-test) on a fixed data set as it is increasingly divided into groups. The final column gives you a good idea of where to use what in your workflows. Somewhere between 100 and 1000 groups, you should switch to something else. Before, the user had to compose built-in reduction operations, which is a chore. Now, there is a new Numba/JIT approach available in RAPIDS 23.02. The UDF is compiled to a single GPU kernel operating on all groups in parallel. This provides performance similar to the non-UDF approach, but with the ergonomics the user expects from Python UDFs.

cuDF Speed-Ups

We’ve also added a new PyTorch allocator in RMM. This enables PyTorch and RAPIDS to use the same memory pool to help with faster and more memory efficient pipelines. Read more about it here.

Available in RAPIDS 23.02, the new groupby-apply feature compiles a user’s per-group function to a single GPU kernel executing on the whole dataframe. Performance is on-par with built in cuDF groupby reductions, and significantly faster than both pandas and legacy group-by-group implementation in cuDF for many groups.

cuML

We understand data science, so we know a lot of it will be done initially on a laptop without a RAPIDS-compatible GPU. It would be nice to start on your laptop and just be able to use the same code on the GPU. That is now possible! A new experimental package, called cuml-cpu is available in our conda channels for systems without a GPU:

conda install -c rapdisai cuml-cpu

Importantly, this package is imported in the exact same manner as the regular cuML conda package, avoiding any needed code changes. So the following code can run in both systems with and without a GPU:

import cuml

lin_reg = cuml.LinearRegression()
lin_reg.fit(X, y)
predictions = lin_reg.predict(X_new)

This new package can be installed in Linux and Windows with WSL2 systems without GPUs, with further OS support coming soon. For systems with a GPU, that package is not even needed since the current cuML package already contains the experimental CPU capabilities! These capabilities allow you to export models between the systems.

For example:

lin_reg = LinearRegression()
lin_reg.fit(X_train_reg, y_train_reg)
pickle.dump(lin_reg, open(“lin_reg.pkl”, “wb”))

This will produce a typical file lin_reg.pkl containing the pickled LinearRegression model. With the new capabilities, you can now import this package in a CPU system:

recovered_lin_reg = pickle.load(open("lin_reg.pkl", "rb"))

with using_device_type('cpu'):
predictions = recovered_lin_reg.predict(X_test_reg)

In a system with a GPU, the execution of code can be controlled in a fine-grained manner via a new context manager, allowing you to test code further:

from cuml.manifold import UMAP
umap_model = UMAP()

# will run in CPU mode, using the UMAP library
with using_device_type('cpu'):
umap_model.fit(X_train_blobs)

# will run in GPU mode, using the cuML implementation
with using_device_type('gpu'):
transformed = umap_model.transform(X_test_blobs)

We’re beginning with support for HDBSCAN (partially), UMAP (partially), Linear Models, Logistic Regression, PCA and tSVD. We will be expanding the number of estimators, functions and parameters supported in the next few versions.

Additionally we added the following features to our manifold and cluster methods: T-SNE and UMAP have improved support for precomputed KNN and precomputed pairwise distances.

RAFT

RAFT was part of the original RAPIDS vision: reusable computational patterns for machine learning and data analytics. We want researchers and developers to feel like they are in a lab with all their reagents handy. In our case, these are computationally optimized building blocks for data science.

We have added Hierarchical (or Balanced) k-means to the methods available in RAFT. This algorithm is one of the reasons RAFT’s new inverted file index (IVF) variants of its approximate nearest neighbors algorithms are so fast.

RAFT is excited to be releasing a k-selection C++ API in the raft::matrix namespace. When presented with a matrix of distances between data points, the k-selection algorithm will select the closest k distances and return them, along with the indices of the closest points. The new API abstracts several GPU implementations of k-selection algorithms behind a single interface, choosing the best algorithm based on the shape of the data and the number of candidates requested.

We have adopted the mdspan standard from C++ 23. This was done in an effort to make Python interactions with multidimensional data have a similar feel to working with Numpy or CuPy ndarrays. Additionally, there is now the ability to serialize and deserialize mdspan`in a way which also mirrors numpy’s serialization format. Now, all of RAFT’s multi-dimensional objects which have row- or column-major layouts can be written to a std::ostream and read from a std::istream.

Deployment, Simplified

If you have used RAPIDS, you may have noticed there hadn’t been one central place to answer the questions you had about getting it up and running. We have a beautiful new resource here. Let us know what you think!

What’s Next

Graph Neural Networks are poised to make major advances in how we understand some of the most important data. If this is the first you’re hearing about “GNNs” check out this thorough introduction and take the course from NVIDIA. cuGraph continued laying the foundation to being able to support GNN training at any scale. This included improving neighborhood sampling performance and the creation of a new Bulk Sampling feature in the cugraph-pyg package, soon to be added to cugraph-dgl as well. GraphSage and GAT support added to the cugraph-dgl package. Additionally a new Feature store class was added to support tensor attributes on vertices used with GNNs.

Conclusion

We’ve done a lot of great work so far in 2023 beyond what’s in this release! We showed you how to improve the speed of your BERTopic workflows with or without a GPU. Dask code now runs where you would like it, CPU or GPU, via a configurable backend. We showed how imbalanced-learn and cuML can give you substantially faster resampling. We discussed accelerated JSON processing with cuDF. And we described how one of the best tools for data science, XGBoost, now automatically handles categorical data for the user. The team is very excited for what we’ve got cooking for GTC in March (check it out — it’s free to register). We’ve also got a blockbuster year ahead! Follow us on Twitter @rapidsai for the latest news and content, and reach out to us on Github to let us know what’s working or not working for, or what else you’d like to see.

The RAPIDS team consistently works with the open-source community to understand and address emerging needs. If you’re an open-source maintainer interested in bringing GPU-acceleration to your project, please reach out on Github or Twitter. The RAPIDS team would love to learn how potential new algorithms or toolkits would impact your work.

--

--