RAPIDS 0.14 Release: S.C.A.L.I.N.G

Supporting the Community by Accelerating Learning and Insight on NVIDIA GPUs

Published in

RAPIDS AI

6 min readJun 23, 2020

The small things add up. It was time for some organization and cleanup, so the RAPIDS 0.14 Release focuses on many quality of life improvements. In 0.14, we concentrated on the details that generally get put on the back burner during major releases — improving documentation, pushing down bug counts, removing legacy code, adding more performance and regression testing in CI/CD, solving issues, and adding a few new features in cuML and cuGraph. That said, we still managed time to pull off a few amazing achievements as well.

Before continuing, let me first say, like everyone else in the community, we are all deeply impacted by the global struggle to stay safe, healthy, and live together equally. Join us and our community in whatever capacity you can — donate, volunteer, protest, tweet, code, educate. The time is now. As part of the open-source community, we are held accountable to speak up and do what’s right.

RAPIDS Everywhere

Exactly as it sounds — our goal is to make RAPIDS as usable and performant as possible wherever data science is done. We will continue to work with more open source projects to democratize acceleration and efficiency in data science further. Within the 0.14 release work, many new additions and extensions to the RAPIDS ecosystem are important to showcase.

The RAPIDS team works closely with major cloud providers and open source hyperparameter optimization (HPO) solutions (such as Ray) to provide code samples so you can get started with HPO in minutes on the cloud of your choice. See the example code here.

Working with Plotly, DataShader, and the RAPIDS viz stack, we added a new demo to explore gigabyte datasets within Plotly Dash, all accelerated by RAPIDS and GPUs. Read about the collaboration in Plotly’s blog, how we built the demo in our RAPIDS Medium blog, and check out the RAPIDS Plotly community webpage for more information on how to get the code and build demos like this yourself.

Our colleagues at NVIDIA released nvTabular — a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte-scale datasets used to train deep learning-based recommender systems. It provides a high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS cuDF library. See how it works here. Speaking of recommenders, the NVIDIA RAPIDS.ai team also placed first on the most recent ACM Recsys Challenge using RAPIDS! Congrats to them, and we look forward to their paper on the winning solution.

There are a few more updates from the community to note. BlazingSQL now supports out-of-core query execution, which enables queries to operate on datasets dramatically larger than available GPU memory. And UCX-Py adds InfiniBand support as well as Multi-Node Multi-GPU NVLink and InfiniBand tests.

Documentation and Paying Down Tech Debt

One of the primary focuses of this release has been improving the experience for end-users of RAPIDS. To that end, we have focused on resolving open issues, paying down tech debt, and improving documentation across the board.

cuDF closed 563 issues during this release, over 300 of which were bugs that users had identified.
cuML closed 206 (85 of which were bugs), and
cuGraph closed 65 (17 of which were bugs).

Documentation was upgraded throughout, with a focus on:

resolving any incorrect or out of date documentation,
consistency across the libraries, and
general improvements.

We think the improvements will make it easier for anyone to pick up and be effective with RAPIDS — we hope you agree! As always, documentation can be found at https://docs.rapids.ai/.

Because we are moving fast (and the whole point of RAPIDS is to provide blistering speed), we want to ensure we do not introduce performance regressions into our codebase. To that end, we have strengthened our internal CI/CD systems to include additional performance and regression testing. Many thanks to our ops team for pushing this through!

Changes to notebooks-contrib

Another significant change in this release relates to notebooks-contrib, which houses community notebooks. Many of these notebooks are no longer actively used by the community, and it requires quite a bit of focus from RAPIDS engineering to maintain these. In the spirit of focusing on fewer, better quality notebooks, we are planning to deprecate official support for these notebooks. Fear not, they are not going anywhere — they will be moved into a separate repo where users can still view, maintain, and fork them if needed. The only change is that the RAPIDS engineering team will no longer be focused on keeping these up to date. Keep an eye out for more details on this soon.

RAPIDS Library Update s— Lightning Round

Here’s a quick summary of what we accomplished in this release:

Dataframes: cuDF and Libcudf

cuDF drops libcudf legacy code, improves API documentation and adds bug fixes as well as performance and memory usage optimizations
cuDF issues labeled “bug” closed: 312
cuDF issues labeled “doc” closed: 42
cuDF issues labeled “feature request” closed: 176
Libcudf issues closed: 179
Libcudf issues labeled “bug” closed: 90
Libcudf issues labeled “tech debt” closed: 23
Libcudf issues labeled “doc” closed: 8
Libcudf issues labeled “feature request” closed: 77

Machine Learning: cuML

Consistent logging interface throughout both C++ and Python layers of the codebase
Improved documentation and example notebooks
Configurable data type API
Core primitives from cuML moved into RAFT (the RAPIDS Analytic Framework Toolset)
Expanded multi-node, multi-GPU (MNMG) MNMG support for ElasticNet and Lasso linear models
XGBoost launched a major new release version 1.1, which is included in the RAPIDS 0.14 package
Sample weights for k-means, many more GPU-accelerated metrics, and sample data generators
Over 140 pull requests merged for smaller improvements or bug fixes

Graph Analytics: cuGraph

Added the ForceAtlas2 graph layout algorithm
A new version of Vertex Betweenness Centrality
Refactored and fixed issues with Louvain. It now returns results comparable to NetworkX and other implementations
Fixed issue in PageRank that was causing it to crash
Fixed BFS to operate on directed graphs
Cleaned up and refactored libcugraph
cuGraph closed 105 issues

Data Visualization: cuXfilter

Added Dask cuDF, multi-GPU support for all charts. Find out more details in our docs.
Pushed four other major improvements, including making dashboard querying stateless, optimizations to grouby queries, and autoscaling axis.
Pushed eight bug fixes, including minor aggregation improvements to datatiles.

Signal Processing: cuSignal

Code reorganization and clean up with a focus on improved documentation and readability
Introduction of second-order section filter (sosfilt) — a parallel and numerically stable alternative to the linear filter (lfilter)
Added the ability to precompile custom CUDA kernels to eliminate JIT compile overhead

Cyber Security: CLX and cyBERT

Merged the first version of the new GPU-based subword tokenizer into the CLX repo, providing 270x speed-up over CPU wordpiece tokenizers for pre-BERT tokenization
Read the blog about the GPU subword tokenizer
Working on a periodicity-based approach to anomaly detection in cybersecurity logs
Refactored CLX code to remove dependencies on deprecated libraries
CLX repo updated to use PyTorch 1.5
Expanded documentation

The Wrap Up

In 0.14 RAPIDS, we refined docs, continued to work with the community on integration, pushed down bug count, expanded C++ examples, and added more tests in CI/CD. In 0.15, we will add more cuStreamz functionality, cyBERT 2.0, and more. We will focus on stability at scale, hardening features, and preparing for 1.0.

As always, we want to thank all of you for using RAPIDS and contributing to our ever-growing GPU-accelerated ecosystem. Please check out our latest release deck or join in on the conversation on Slack or GitHub.