RAPIDS Release 0.17: The Gift that Keeps on Accelerating

Josh Patterson
RAPIDS AI
Published in
6 min readDec 14, 2020

--

We just celebrated Thanksgiving in the US, and I’m thankful for the perseverance of the RAPIDS team. Many team members had second and third jobs this year: caring for children, supporting partners, helping family, coping with never leaving the house. Despite all that, we persist. Beyond improvements to the RAPIDS suite of tools since the last release, our medium page summarizes everything the team has done:

  • Advancing AutoML with TPOT integration,
  • Improving workflows,
  • building time-series forecasting for the GPU,
  • Three big steps in streaming, and
  • Refining end-to-end GPU data science.

I’m also thankful for everyone outside the team that has joined us on this journey. More people than ever are downloading and using RAPIDS. When we started RAPIDS, this was the dream: a set of tools that made the lives of data engineers and data scientists better.

This blog, like every release blog, celebrates and announces the great strides RAPIDS has made. I’m not going to lie, this has been a hard year for everyone. We take joy in our work, and the team has accomplished a lot, but some days, just putting one foot in front of the other is a big win. One thing I am certain about in the coming year is that the RAPIDS team will continue to innovate, improve, iterate, and progress. As many of us look forward to the December holidays, let me recap the early gifts we’re giving this season in RAPIDS Release 0.17.

RAPIDS Core Libraries Update

RAPIDS Data Frames: cuDF

A major new feature in the cuDF library is support for Decimal types (32-bit and 64-bit fixed-point) in cuDF Python, with backing by efficient libcudf CUDA C++ implementations of many algorithms, including binary operations, rounding, casting, clamping, and several unary operations. There is more coming for Decimal types in upcoming releases, including reduction and groupby aggregation support.

Support for nested types continues to progress in libcudf, with support for concatenating struct columns and scattering lists-of-structs columns. This release also added many under-the-hood improvements and smaller features, as well as fixes for 59 bugs. I encourage you to check out the changelog to see everything that’s been done.

RAPIDS Machine Learning: cuML and XGBoost

cuML and XGBoost added a wide range of improvements, but they particularly focused on the themes of memory efficiency and model explainability. cuML now supports sparse input data for kNN and UMAP models, in addition to the previously-added sparse support in PCA and Naive Bayes. These features are critical for natural language processing models, and they are just the beginning as we have sparsity support for many more models coming soon. XGBoost continues to handle larger data sizes by expanding its distributed Dask API with key features like early stopping rules and a flexible callback API.

XGBoost has also integrated the first GPU-accelerated version of the TreeSHAP algorithm for model explainability. It can accelerate prediction explanations by 10x-100x or more, and is also being integrated upstream into the popular SHAP explainability package. For models beyond trees, cuML has added experimental Permutation and Kernel SHAP explainers. These can be used to explain the output of virtually any model and can take advantage of the huge prediction speedups from cuML models.

Figure 1. V100 vs. 40 CPU cores — cal housing-med; Figure 2. 8x V100 — cal housing-med; Figure 3. 40 CPU cores — cal housing-med

Additionally, cuML added an experimental release of LARS (Least-Angle Regression), a feature inspired by user requests. Similarly, we added the long-requested GPU acceleration for multi-node multi-GPU logistic regression in dask-glm. Please continue filing feature requests if there are more models you’d like to see!

We continued improving the internals of cuML as well, with 44 bug fix PRs, an overhaul of the Base estimator classes to improve consistency in input and output types, and a large refactoring of CUDA primitives to move more of them out to our low-level RAFT library.

RAPIDS Graph Analytics: cuGRAPH

cuGraph continued to focus on compatibility and interoperability with other libraries and frameworks, including output matching based on input data type. This release added support to accept SciPy and CuPy sparse matrix objects for the Weakly Connected Components, Strongly Connected Components, Single Source Shortest Path, and Breadth First Search algorithms. We improved NetworkX support that was started in the last release and added better support for Pandas and Numpy, including new generic functions for adding edge lists.

We like to include new graph algorithms each release, and this release includes two new algorithms. The first is minimum spanning tree (MST), which also supports maximum spanning trees as well as minimum/maximum spanning forests. The second is the classic Hungarian algorithm for solving assignment problems. This release also included extending Katz centrality to support multi-node multi-GPU processing to enable scaling to huge datasets.

RAPIDS cuXfilter and Visualization

cuXfilter has added datetime and other general improvements to cuXfilter, and it is now able to visualize large graphs via Datashader. You can see some great graph examples in our extensive JupyterCon 2020 viz tutorial notebook.

We are also happy to share the fantastic new linked brushing capability, accelerated by cuDF, in Plotly Dash via holoviews. Read more about it on the Plotly Medium blog and check out the Dash documentation.

RAPIDS Memory Manager (RMM)

RMM has new stream wrapper classes on the C++ side to improve type safety of streams and take steps toward improving stream semantics across RAPIDS. Python stream wrappers are coming in the next release. RMM also added a new tracking resource adaptor which will be helpful in detecting memory leaks. For the first time in this release, RMM now has Python documentation online. And last, but definitely not least, we have published a detailed NVIDIA Developer Blog post about RMM.

Cyber Log Accelerator (CLX)

For this release, CLX has made a number of quality-of-life enhancements and updates to multiple workflows, modules, and notebooks. cyBERT, which lets you parse unstructured logs without the need for regex, now supports ELECTRA models in addition to BERT models. A new module for periodicity detection was added, and the DNS extractor was updated to fix a few bugs. Phishing detection is now a CLX module, and you can view an example notebook to see how to use a BERT model for phishing detection on your own emails. If you’re looking to get started with Streamz or cuStreamz, CLX provides multiple example workflows and a starting Docker image for you to try. There’s even a notebook demonstrating how to use FIL+cuStreamz for your inference needs. The CLX documentation has been expanded and updated, and don’t forget about the cyBERT 2.0 blog that’s out now.

Dask

For this release, we’ve added support for launching Dask + RAPIDS on AWS, GCP, and Azure using the raw VM instance types. This enables access to large GPU instances not currently supported in managed cloud services. We’ve also added novel communication and spilling improvements in Dask-CUDA. Users may now optionally communicate previously spilled data from the GPU without having to deserialize the data on the GPU. While this is still an experimental feature, we have seen significant performance improvements when enabled.

BlazingSQL

This release brings a lot of new features for users as well as many under the hood improvements. There are multiple new SQL statements supported, namely string functions such as REPLACE, TRIM, and UPPER. Users are also now able to create tables directly off compressed text files, directories that implement Hive partitions structure, and text files that are individually larger than GPU memory.

After working on it for months, the new communication layer is now merged. Users can expect improved distributed performance over TCP, and they can begin experimenting with UCX support which enables shuffles over NVLink and Infiniband.

RAPIDS Community

Like I announced in the last blog, we’ve finished our first podcast. We’re really excited about this new way to interact with the community, and we would really appreciate your feedback! We are planning on this being a bi-weekly thing. We’ll be interviewing Bartley Richardson and Rachel Allen about CyBERT, GPU cyber data science, and NLP on the GPU.

Wrapping Up

Data science is important, and our passion for it animates everything we do. But data science is for people. This year has reminded us just how important our relationships and communities are. As the year winds down, I hope all of you can find rest and community with friends and family, even if only via Zoom. I appreciate every one of you that has joined our community, and I wish you a peaceful holiday season.

--

--

Josh Patterson
RAPIDS AI

Working on something new. HIRING! Ex NVIDIA, White House PIF, Accenture Labs. Started RAPIDS.ai.