RAPIDS Release 21.12

Strengthening the GPU analytics foundation, improving ecosystem compatibility, and bringing new functionality and integrations.

Published in

RAPIDS AI

7 min readDec 16, 2021

We’re excited to announce the release of RAPIDS 21.12. By expanding and strengthening the compatibility of RAPIDS within the GPU analytics ecosystem, this release provides the tools and integrations to improve developer productivity, whether you’re working directly with or building on top of RAPIDS libraries. Based on community feedback and feature requests, this release also includes new functionality, providing you with more possibilities to accelerate workflows with GPUs.

Below, we highlight some of the changes made in 21.12 including:

Improved ease of use through minor version compatibility in CUDA 11+ and new integrations within the Python ecosystem
Support for new data types to accelerate a wide range of workloads
New functionality across many RAPIDS libraries

Strengthening the Foundation

The following updates have been made to strengthen the foundation on which RAPIDS is built, allowing us to deliver a faster and more robust experience with the RAPIDS libraries.

NVIDIA CUDA Minor Version Compatibility

RAPIDS 21.12 packages are built using the latest 11.5 version of NVCC and the NVIDIA CUDA Runtime and distributed for use with any CUDA driver >=450.80.02 via the use of CUDA minor version compatibility. This means you no longer need to update your CUDA driver to use newer CUDA binaries or runtime as long as it’s at least 450.80.02.

Previously, a library built against CUDA 11.y would require the driver associated with 11.y leading to many libraries across the GPU PyData ecosystem requiring different CUDA versions based on build constraints, often making them incompatible with one another. Support for minor version compatibility allows RAPIDS libraries built on CUDA 11.x to coexist with other libraries that require any other version of CUDA 11.x, as long as the minimum driver version is satisfied.

Among many other benefits, users can now conda install RAPIDS libraries with deep learning frameworks like PyTorch, even though they’re built with different minor CUDA versions (we’ll discuss this more below).

For more information about CUDA compatibility, please see the documentation.

Bringing New Types into the Ecosystem

Decimal128 in libcudf

In collaboration with the CUDA team, we’ve implemented and brought support for 128-bit decimal types to libcudf. Built on the new 128-bit integer type in CUDA 11.5+, Decimal128 enables higher precision analytics for large-scale, correctness-critical operations on decimal type data which are common in financial, retail, and e-commerce domains.

Starting in the 21.12 release, the Spark-RAPIDS plugin will automatically use Decimal128 when needed for operations on decimal types, consistent with existing Apache Spark workloads on the CPU.

For folks interested in learning more, stay tuned for a standalone blog on the Decimal128 implementation and fixed-point data types.

Categorical Variables in XGBoost

In version 1.3, XGBoost added initial, experimental support for categorical typed data with GPU-based training. In the current version 1.5.1, XGBoost’s support for categoricals now includes most of the other features, including DMatrix constructors, predict, SHAP values, feature importance, and more.

If your data includes categorical types, pass enable_categorical=True to your DMatrix constructor:

Xy = xgb.DMatrix(X, y, enable_categorical=True)
booster = xgb.train({"tree_method": "gpu_hist"}, Xy)preds = booster.predict(Xy) # predict on categorical data
shap_values = booster.predict(Xy, pred_interactions=True) # SHAP with categorical data

Support for categorical type data has been one of the most requested XGBoost features from the community, and we’re excited to support DMLC in helping bring it to fruition. We’re currently helping bring this feature to XGBoost on CPUs.

New Features

RAPIDS 21.12 introduces new features and functionality across the RAPIDS ecosystem:

RAPIDS DataFrame Library (cuDF)

New functionality in cuDF increases support time-series analytics and significantly improves the user-defined function (UDF) interface to match pandas semantics.

Time Series Updates

cudf.Grouper

Historically, cuDF did not provide a simple way to group time series data based on time increments — if your data was recorded hourly and you wanted to aggregate it by day, you had to write your own code to do so. cuDF 21.12 adds the ability to group time series data by specific increments, using the cudf.Grouper() type.

df
                   ts  value
0 2000-01-01 00:00:02      1
1 2000-01-01 00:00:07      2
2 2000-01-01 00:00:02      3
3 2000-01-01 00:00:15      4
4 2000-01-01 00:00:05      5
5 2000-01-01 00:00:09      6grouper = cudf.Grouper(key="ts", freq="4s")
df.groupby(grouper).mean()value
ts
2000-01-01 00:00:00    2.0
2000-01-01 00:00:04    3.5
2000-01-01 00:00:08    6.0
2000-01-01 00:00:12    4.0

Resampling

This release also makes it possible to downsample and upsample a time series, interpolate between existing records, and aggregate resampled results using the .resample() function.

Aggregate per three minutes

import cudfindex = cudf.date_range(start="2001-01-01", periods=10, freq="1T") # one minute frequency
sr = cudf.Series(range(10), index=index)sr.resample("3T").sum()
2001-01-01 00:00:00     3
2001-01-01 00:03:00    12
2001-01-01 00:06:00    21
2001-01-01 00:09:00     9
dtype: int64

Upsample from one minute to 30 seconds

sr.resample("30s").asfreq()[:5]
2001-01-01 00:00:00       0
2001-01-01 00:00:30    <NA>
2001-01-01 00:01:00       1
2001-01-01 00:01:30    <NA>
2001-01-01 00:02:00       2
dtype: int64

Groupby.diff()

Finally, the new Groupby.diff() functionality makes it simple to compute the difference between concurrent recorded values for each group in a time series. As this was one of our most requested features from the Kaggle community, we’re looking forward to seeing how this helps accelerate feature engineering.

import cudfdf = cudf.DataFrame({
    "key": [0,0,0,1,1,1],
    "val": [1,4,9,1,-3,5]
})
df.groupby("key").diff()
   val
0  <NA>
1     3
2     5
3  <NA>
4    -4
5     8

New User-Defined Function Interface

Sometimes, we can’t easily express our logic in terms of columnar operations. In these circumstances, we often write User Defined Functions (UDF) that operate element-wise on a column.

In the past, UDFs in cuDF required expressing the function with complex iteration semantics. This was challenging to use with null values and required a complicated DataFrame.apply_rows interface.

In this release, we’ve significantly improved the UDF experience. Like pandas, cuDF now supports the DataFrame.apply interface, provides a row abstraction, enables null-aware semantics with cudf.NA, and pushes toward syntax parity with the CPU PyData ecosystem.

The end result is that you can now express UDFs like the following:

import cudfdf = cudf.DataFrame({
    "a": [1,None,-3,4],
    "b": [10,2,2,4],
    "c": [0,1,0,4]
})def custom_add(row):
    if row["a"] > 0:
        return row["a"] + row["b"]
    elif row["a"] is cudf.NA:
        return 99
    else:
        return row["a"]df["out"] = df.apply(custom_add)
df.head()
     a   b  c  out
0     1  10  0   11
1  <NA>   2  1   99
2    -3   2  0   -3
3     4   4  4    8

To learn more, visit the DataFrame.apply documentation.

RAPIDS Machine Learning Library (cuML)

The increased support for time series data doesn’t stop with updates to cuDF — we’ve added new functionality and stability improvements to the ARIMA model in cuML. cuML now also provides an optimized Linear SVM model and built-in support for GPUTreeSHAP with tree models.

Increased ARIMA Functionality and Stability

cuML introduces the ability to account for exogenous variables when working with ARIMA models. This feature was a common request from the community, and its inclusion brings cuML’s ARIMA closer to parameter parity with statsmodels.

Learn more in the ARIMA documentation.

High-performance Linear SVM

Although linear SVM models could previously be trained in cuML, the algorithm wasn’t optimized and sometimes took longer to train than equivalent CPU implementations. RAPIDS 21.12 adds optimized LinearSVC and LinearSVR estimators, relying on Quasi Newton solvers to increase the speed of training.

import cumlX, y = cuml.datasets.make_regression()
clf = cuml.svm.LinearSVR()
clf.fit(X,y)

Expansion of cuML Explainer Module [Experimental]

As model explainability continues to grow in importance for real-world use cases of Machine Learning, we’re pleased to add the GPUTreeShap explainer into cuML as an experimental feature to this release. Experimental support allows you to obtain the estimated SHAP values when using XGBoost or RandomForestRegressor.

import cuml
from cuml.experimental.explainer.tree_shap import TreeExplainerX, y = cuml.datasets.make_regression()
clf = cuml.ensemble.RandomForestRegressor()
clf.fit(X,y)explainer = TreeExplainer(model=clf)
explainer.shap_values(X[:3])
array([[ -11.055127 ,   69.24626  ],
       [  -1.3223906, -129.21461  ],
       [   3.0042486,  -89.169876 ]], dtype=float32)

Ecosystem Integrations

RAPIDS and Deep Learning Frameworks

As noted above, support for CUDA minor version compatibility allows us to more cleanly interact with other GPU PyData libraries. One of the most common requests we’ve received from the community has been to improve our compatibility with PyTorch, which is currently on CUDA 11.3 (for which RAPIDS does not provide conda packages).

With the 21.12 release, as long as your CUDA driver version is high enough, you can now install cuDF (and other RAPIDS libraries) alongside PyTorch with the following command:

conda create -n rapids-torch -c rapidsai -c nvidia -c pytorch -c conda-forge cudf=21.12 cudatoolkit=11.3 pytorch torchvision torchaudio

For those who prefer working with pre-built containers, cuDF, cuML, cuGraph, XGBoost, and Dask are now available out of the box in the NVIDIA optimized PyTorch and Tensorflow Deep Learning Framework containers on NGC, the NVIDIA AI software hub and container registry.

Dask-SQL [Experimental]

We’re excited to announce experimental GPU support for Dask-SQL via RAPIDS.

Dask-SQL is a Dask community project that provides a distributed SQL engine for the Dask ecosystem. Connecting an Apache Calcite SQL interface to the underlying Dask DataFrame API, Dask-SQL enables users to write SQL workflows and also take advantage of the full flexibility of the Dask ecosystem. Because it’s built on PyData tools, users can mix SQL, DataFrame, UDFs, and even machine learning algorithms smoothly into the same workflow.

To get started with Dask-SQL and RAPIDS, you can use the default install command from the release selector on our website. For example:

conda create -n rapids -c rapidsai -c nvidia -c conda-forge rapids=21.12 python=3.8 cudatoolkit=11.4 dask-sql

To learn more about Dask-SQL, please visit the documentation or watch John Zedlewski’s State of RAPIDS GTC talk, where Ben Zaitlen and Randy Gelhausen provide an overview and demonstration of what GPU-accelerated Dask-SQL enables.

Conclusion

As 2021 draws to a close, we want to take a moment to thank the RAPIDS community for all the feedback, pull requests, and discussions around how we can work together to bring you an even better RAPIDS experience. In 2022, we’ll continue to focus on improving our ecosystem integrations and expanding functionality, putting user experience at the forefront of our mission.

As always, find us on GitHub, follow us on Twitter, and check out our documentation and getting started resources. We’re excited to have you join us, and we’re looking forward to another great year of RAPIDS.