RAPIDS Release 21.06

RAPIDS Versioning for Continuous Improvement

Published in

RAPIDS AI

4 min readJun 17, 2021

Astute readers will immediately notice that our last release blog, published in April, covered changes in Version 0.19. The year has been moving fast, and as summer begins and life gets a little bit back to normal for many of us, you could be excused for thinking you might have missed 2+ releases. One of the big changes in this release is the transition to CalVer versioning for the RAPIDS ecosystem. Our next release in August will be 21.08. We are always striving to improve RAPIDS. Rather than a very long march until “1.0,” this change not only simplifies how users can identify our releases, it reflects our philosophy of continuous improvement and progress, not “perfection”.

We were pleased by all the great RAPIDS-related talks at Dask Summit 2021. RAPIDS has relied on Dask for multi-GPU and multi-node workloads since we launched, and we were excited to talk with the broader Dask community about RAPIDS and Dask integrations. With three talks, four tutorials, and one workshop, everything from basic applications to building advanced tools with RAPIDS and Dask was on display. More great things are on the way.

RAPIDS Core Libraries Updates

RAPIDS cuDF (DataFrames)

We have been hard at work so you can work less by improving and adding to the tools available for groupby aggregations, lists, strings, and CSV and ORC writers. Working with groupby aggregations just got easier and more comprehensive with the addition of Shift, replace_nulls, and multiple cumulative operations. Outputting your finished work has been simplified with the addition of `Decimal` data type support for csv and orc writer functions in Python. You can more easily manipulate your lists with the newly added `join_list_elements`, `getitem`, `concatenate_list_elements` functions to List.

RAPIDS cuML (Machine Learning)

cuML continues to grow and integrate with multiple libraries. For this new version, we worked hard to introduce a new backend to use the Forest Inference Library (FIL) with the NVIDIA Triton Inference Server.

We’re also introducing some amazing new algorithm updates:

We have a new GPU accelerated version of the rapidly gaining in popularity HDBSCAN algorithm, improving our clustering algorithms for better understanding your data.
The new backend for Random Forest algorithms is now the default for both classification and regression. It provides significantly better performance and accuracy than before.
t-SNE is now accelerated with the magic of Fast Fourier Transforms in the new FIt-SNE.

Finally, our preprocessing capabilities continue growing, with new ColumnTransformer and FunctionTransformer functions.

RAPIDS cuGraph (Graph Analytics)

cuGraph has been focused on making things better and making sure algorithms scale. In this release we updated Weakly Connected Components (WCC), great for breaking the graph into smaller components, to scale across multi-GPU and multi-node. We also worked to ensure that PageRank, Katz, WCC, Louvain, BFS, and SSSP all scale to large graphs, processed on large clusters, and kicked off a benchmarking effort that should be completed in the next release. Additionally, we added a graph Batching functionality to libcugraph that allows a small graph to be replicated across multiple GPUs. Lastly, we now have multi-column vertex ID support for all algorithms.

Node-RAPIDS

Did you know that there are RAPIDS bindings in Node.js? As other data science tools like TensorFlow have been made available to node developers, it was a natural step to bring the accelerated compute to Node. You can find more information here, and if you haven’t already, check out this presentation to learn more.

Community and General Updates

The RAPIDSFire podcast has new episodes! Paul Mahler sits down with Jim Scott, Head of Developer Relations for Data Science at NVIDIA, to talk about the data science of whiskey, the data science of fitness, and more. The episode is available here, or wherever you get your podcasts. Remember, if there’s something you’d like to hear about or someone you think would make a great guest let us know by tweeting us at @RAPIDSai.

In Google Colab news, another release, another new install process — but this time, the changes have some new benefits! We’ve refactored the code for Colab, making it much easier for the community to understand and modify the code, and added BlazingSQL install options. We are also moving the codebase to the RAPIDS community org alongside a revamped Notebooks Community Contrib! Thank you to all the community members who helped and we look forward to your future contributions!

Don’t forget to subscribe to the RAPIDS Youtube channel. If you have RAPIDS-related videos that you’d like to share, please make an issue on Notebooks Community Contrib and we’ll add it into our playlists.

Conclusion

As summer kicks off in the Northern Hemisphere, we will continue to build out more features to make your data science work easier, more fluid, and more accurate. Tell us how you’re using RAPIDS on Twitter or via our Google Group. As always, you can file a feature request on GitHub — this project is by and for the data science community. We here on the RAPIDS wish you an enjoyable summer and we’ll talk to you next time.