Summer was Cancelled, RAPIDS 0.15 is Not

See the latest features and functionality included in the RAPIDS 0.15 Software Release

Published in

RAPIDS AI

6 min readSep 4, 2020

Like you, the RAPIDS team has had to adjust to new realities about work, home, and family during this unusual and challenging summer. We are looking to the autumn with uncertainty about the wider world, but hope for continued improvement. We all continue to walk forward. This brief blog will talk a little bit about the work the GPU data community has been doing, and touch on highlights of the RAPIDS 0.15 release.

Heading to New Places

Today, thinking of everything that has been built on RAPIDS, given all the early challenges, it is amazing to see the evolution from when RAPIDS was just an idea. Not going to lie to you: it wasn’t all fun. But this technological evolution emphasizes that when we rise to face the challenges in our way, together, the final results can be better than what you dreamed. RAPIDS success has, is, and will be because of the great community contributing to it. We’ve published or have been mentioned in over thirteen blogs since the last release that demonstrates how RAPIDS is maturing and being used in new ways every day — including hyperparameter optimization, natural language processing, and ensemble learning. RAPIDS is being employed in fields we didn’t even think of when we first started, like weather analysis, accelerating geographic data science, and cheminformatics. Using RAPIDS for single-cell genomics lets us analyze 1 million cells in only 11 minutes (Figure 1). This is amazing work, and it’s not everything. RAPIDS is also being used for recommender systems, cybersecurity workflows, and crushing big data benchmarks. If you’re into seismic facies analysis, RAPIDS was shown to accelerate this workflow by up to 258x.

*Figure 1: Analyzing 1 million cells in 11 minutes with RAPIDS*

Release Details and Core Library Updates

The release of CUDA 11.0 was a huge step forward for the general-purpose GPU computing world. RAPIDS 0.15 supports CUDA 11.0 and is sunsetting CUDA 10.0 in this release. This was a big effort from the RAPIDS team, and I’m really glad we got it over the finish line for this release.

RAPIDS cuDF

Version 0.15 was a big release for cuDF, adding 80 new features and 129 improvements. I can’t do them all justice here, but I’m excited about Apache Kafka support in cuStreamz and the many new and improved ways to work with strings. cuDF 0.15 includes dozens of improvements and new APIs for strings columns and text processing, including a new high-performance subword tokenizer. A lot of work began in 0.15 to support a number of new data types. Unsigned integer type columns and time duration columns are now supported. Progress has been made on decimal fixed-point type columns, and nested list and structure columns, with full support for these landing in the next releases.

RAPIDS cuML

Likewise, the cuML team accomplished a lot in this release. As a former economist, I’m excited to see that we have initial support for auto-ARIMA, and with the introduction of multi-node, multi-GPU KNN, I’m looking forward to seeing this amazing technique unleashed on incredibly huge data sets. Building on new functionality from cuDF, the cuML library also added many of the most important NLP preprocessors — including TF-IDF and HashingVectorizer, making it easy to build complete, on-GPU pipelines for text + ML. These pair nicely with the new TargetEncoder preprocessing approach, which brings the most recent developments in categorical variable encoding to everyone. (Stay tuned for a blog on this as well.)

As part of the RAPIDS team’s work with the broader ML community, XGBoost has added a number of new GPU-friendly features as well. XGBoost’s Dask API now supports zero-copy load from GPU arrays, while both single-GPU and distributed workloads will benefit from optimizations for wide datasets. The RAPIDS 0.15 XGBoost packages also include experimental support for RMM (the RAPIDS Memory Manager), allowing it to share memory pools with other RAPIDS workloads. Finally, we announced the deprecation of the old dask-xgboost package. This package still ships with 0.15, but will be removed in a future release, as the native XGBoost Dask API supersedes it.

RAPIDS cuGraph

The cuGraph team focused on a range of areas in this release. The first was improving community detection analytics with a refactoring of the Louvain algorithm to provide better results and be more consistent with other frameworks. That refactoring also improved ECG. Additionally, the team is happy to announce the initial release of the Leiden algorithm. Beyond community detection, the team also implemented the Edge Betweenness Centrality algorithm and wrapped the HITS algorithm from the Gunrock package.

The next big area of focus for the cuGraph team was improving the scaling of PageRank, Personal PageRank, BFS, and the underlying infrastructure so that the algorithms scaled to multiple nodes and multiple GPUs (MNMG). Those algorithms still have a 32-bit limit on the number of vertices. Correcting that limitation was the last major focus on this release with the creation of a new set of primitives and better data partitioning. The benefit of that work will be seen in the 0.16 release.

RAPIDS Memory Manager (RMM)

The RAPIDS Memory Manager (RMM) continues to improve and integrate more fully with all parts of the RAPIDS ecosystem. RMM 0.15 deprecates the CNMeM library previously used to provide a memory pool, and replaces it with its own `pool_memory_resource` which provides better pool growth heuristics and CUDA per-thread default stream support with less frequent synchronization than CNMeM. Further improvements to RMM `pool_memory_resource` and other RMM memory resources are coming in 0.16. With this release, I’m ecstatic to say RAPIDS, Numba, CuPy, and XGBoost now use RMM and support pluggable memory interfaces. I believe this trend will continue and more libraries will adopt pluggable memory interfaces to improve interoperability in the Python ecosystem.

Dask-cuDF

In 0.15, we have improved performance in Dask-cuDF of reading parquet files and metadata writing. We’ve also contributed improvements developed for Dask-cuDF into Dask Core, specifically a hash-based shuffling implementation. We’ve also improved the serialization/deserialization schemes within cuDF to bypass the copying of frames during spilling when memory is limited. Looking ahead, 0.16 will continue pushing on I/O improvements not only in Parquet reading/writing/filtering but also extending these efforts to ORC files as well.

BlazingSQL

Our friends at BlazingSQL have made many stability and performance improvements, and are making great strides on out-of-core operations. Two-thirds of the queries we currently use to benchmark GPU data science can be run successfully at the 10 terabyte-scale using only a single V100. What this means is you can do more analysis with less hardware, and it has been a great quality-of-life improvement for the development community. You don’t have to wait for an out-of-memory error with BlazingSQL anymore; your code will just run. The BlazingSQL team is working on this every day, and like all of us, they really appreciate you filing issues on GitHub when you find something to improve.

Conclusion

As we move closer to digital GTC this October, we look forward to presenting even more great features and improvements. At this Fall’s GTC, we will host many exciting talks about RAPIDS performance, where it’s going, and how people are using it to improve data science.

We’re exceedingly grateful for our community, your support, and the continued opportunity to improve and expand GPU data science. We’d love for you to be a part of this. No matter if you’re in the cloud or working on your local machine, we want your experience to be great. Please review our docs on RAPIDS, join us on Slack or Github, and let us know where we can improve or how we’re improving your data science performance. We’re walking forward, together.