RAPIDS Release 22.02

Bringing more integration points, expanding functionality, and improving performance.

Sophie Watson
RAPIDS AI
5 min readFeb 9, 2022

--

We are pleased to announce that RAPIDS 22.02 was released last week. This latest release focuses on adding new capabilities to the RAPIDS libraries, as well as enhancing performance through speed and scaling, by making some changes behind the scenes. Whether you’re working with RAPIDS directly or building on top of the libraries, you will reap the benefits of the changes made in this release.

Below, we highlight some of the changes made in 22.02 including:

  • Expanded support for model explainability with SHAP values
  • New functionality including additional methods for grouping data in RAPIDS cuDF
  • More integrations with other platforms, including support for RAPIDS cuXfilter in Google Colab

New Capabilities

The following updates have been made to broaden the functionality of RAPIDS and provide users with more ways to use RAPIDS in their daily work.

Further Expansion of cuML Explainer Module

Being able to explain the decisions made by machine learning models is an increasingly important requirement for many real-world use cases. Building on experimental features introduced in the previous release, we are pleased to add more support for model explainability, with the ability to compute SHAP values for a wider range of models. You can now use SHAP values for models from XGBoost, LightGBM models, and Random Forests with categorical variables from both scikit-learn and cuML.

RAPIDS DataFrame Library (cuDF)

Two additional groupby methods were added to the RAPIDS cuDF library in the 22.02 release. You can now compute the Pearson correlation coefficient between data frame columns, using the .corr() function, and use .transform() to apply aggregations to groups and broadcast the results to the group size.

Transform:

import cudfdf = cudf.DataFrame({‘a’: [2, 1, 1, 2, 2], ‘b’: [1, 2, 3, 4, 5]})df.groupby(‘a’).transform(‘max’)
b
0 5
1 3
2 3
3 5
4 5

Pearson correlation coefficient:

import cudfgdf = cudf.DataFrame({
"id": ["a", "a", "a", "b", "b", "b", "c", "c", "c"],
"val1": [5, 4, 6, 4, 8, 7, 4, 5, 2],
"val2": [4, 5, 6, 1, 2, 9, 8, 5, 1],
"val3": [4, 5, 6, 1, 2, 9, 8, 5, 1]})
gdf
id val1 val2 val3
0 a 5 4 4
1 a 4 5 5
2 a 6 6 6
3 b 4 1 1
4 b 8 2 2
5 b 7 9 9
6 c 4 8 8
7 c 5 5 5
8 c 2 1 1
gdf.groupby("id").corr(method="pearson")
val1 val2 val3
id
a val1 1.000000 0.500000 0.500000
val2 0.500000 1.000000 1.000000
val3 0.500000 1.000000 1.000000
b val1 1.000000 0.385727 0.385727
val2 0.385727 1.000000 1.000000
val3 0.385727 1.000000 1.000000
c val1 1.000000 0.714575 0.714575
val2 0.714575 1.000000 1.000000
val3 0.714575 1.000000 1.000000

Decimal 128 type in cuDF

RAPIDS 22.02 introduces more support for decimals: You can now use 128-bit decimals in cuDF, enabling higher precision on decimal operations, bringing huge benefits to many workloads, including those in the financial and e-commerce domains.

s = cudf.Series([1, 2, 3, 4], dtype=cudf.Decimal128Dtype(scale=5, precision=6))s
0 1.00000
1 2.00000
2 3.00000
3 4.00000
dtype: decimal128
s+s
0 2.00000
1 4.00000
2 6.00000
3 8.00000
dtype: decimal32

Dask-SQL [Experimental]

Building on the experimental GPU support for Dask-SQL via RAPIDS, which was introduced in RAPIDS 21.12, this latest release adds support for multi GPU training and inference for cuML and XGBoost models directly within SQL statements. With this release, we also introduced basic support for Dask’s read filtering in CREATE TABLE WITH statements, which you can find out more about in this blog. We continue to expand Dask-SQL’s grammar and improve performance and efficiency behind the scenes.

GPUDirect Storage integration

One major time sink when running workloads on GPUs is I/O operations. With the 22.02 release of RAPIDS, GPUDirect Storage (GDS) has been enabled by default, introducing a direct path for memory access between GPU memory and storage, avoiding buffer bounce through CPU, and leading to optimized I/O operations.

cuxfilter support on SageMaker Studio Lab and Google CoLab

Cuxfilter is a RAPIDS framework that allows you to connect visualizations to GPU accelerated cross-filtering, bringing you the ability to quickly and easily explore large datasets. Over the last few months, extensive work has been undertaken to refactor the cuxfilter backend to use Holoviews. As a result, we are excited to announce that cuxfilter is now supported on SageMaker Studio Lab and Google CoLab, making it easier than ever to try out in the cloud.

Performance Enhancements

These updates have been made behind the scenes to improve the performance of workloads running on RAPIDS. You don’t need to change the way you work to see these benefits!

cuGraph scaling improvements

The RAPIDS cuGraph library enables you to create and execute GPU accelerated graph algorithms. In release 22.02 we improved memory usage, allowing large graphs to be loaded. We also enhanced graph primitives which improved the performance of BFS, and have also been working hard in improving and validating the scalability of cuGraph. The recent changes have been tested on 512 GPU and large tests are in the works. Look out for an upcoming blog for more details on the scalability and performance of cuGraph.

Enabled Apache Calcite CBO rules in Dask-SQL

We continue to increase the functionality of the experimental GPU support for Dask-SQL via RAPIDS in this 22.02 release. Cost-based optimization (CBO) uses table statistics to determine the most efficient execution plan for a given query. The Apache Calcite SQL interface now uses cost-based optimization rules to query a Dask DataFrame. You can change the `statistics` parameter of a query to determine the statistics used during the optimization.

Improved performance of join and scan operations in cuDF

During routine benchmarking of cuDF, we found that join operations with a high match rate on equality conditions resulted in poor performance. To improve support for these cases, we implemented a new mixed join kernel operating on both equality and inequality conditions. The new kernel also leverages abstract syntax trees (ASTs) and new column copying patterns. With these improvements, we observed 2x speedup in join operations during 3TB distributed workloads.

Additionally, we found that operations using groupby::scan could be made more efficient by taking advantage of presorted data. By introducing new patterns in the groupby sort functors, we realized a 1.6x speedup for presorted scan operations.

Conclusion

With this 22.02 release of RAPIDS we continue to focus on improving our ecosystem integrations and expanding functionality, whilst striving to keep the user experience at the forefront of our mission.

Thank you to all of the RAPIDS community for the continued feedback, pull requests and discussions that help us to deliver you an even better RAPIDS experience.

NVIDIA GTC is taking place online, from March 21–24th, and is a great forum to hear from both RAPIDS users and developers directly. Make sure you check out these sessions to find out how RAPIDS is being used across industries and be the first to hear about our upcoming plans.

As always, find us on GitHub, follow us on Twitter, and check out our documentation and getting started resources.

--

--