More Progress for Big Impact in the Latest RAPIDS Release
RAPIDS 0.19 Released at GTC Spring 2021
The NVIDIA GPU Technology Conference (GTC) is always a special occasion for the RAPIDS team. Not only does it spark memories of RAPIDS’ debut at GTC Munich 2018, but it also marks an opportunity to reflect on its growth since then. In that spirit, the team presents the “State of RAPIDS” every GTC to highlight the growth of the project and community as well as discuss coming improvements and future direction.
Delivering the GTC 2021 iteration of the State of RAPIDS last week, Keith Kraus and John Zedlewski showcased the continued maturation of the project, the expansion of the community, and the growth of a RAPIDS-powered ecosystem. Speaking to these themes, the presentation covered:
- Community and engagement milestones like achieving over 100k monthly downloads
- Updates and growth of fundamental RAPIDS core libraries (cuDF, cuML, and cuGraph)
- Expansion into new domains such as explainability, AutoML, and accelerated Node.js
- New initiatives including integrations with NVIDIA Triton Inferencing and NVIDIA Morpheus
As GTC 2021 came to a close, contributors followed close behind with the newest RAPIDS release. With a focus on performance, usability, and accessibility, RAPIDS 0.19 reflects the incremental improvements that yield the larger progress highlighted by each GTC.
New in RAPIDS 0.19
RAPIDS 0.19 adds new updates, features, and improvements. At a high level, support for CUDA 11.2 was added to all libraries. CUDA 11.2 brings CUDA Enhanced Compatibility, which makes forward compatibility with future CUDA 11.x releases much simpler. Additional updates included:
cuDF
cuDF now supports nested types such as lists and structs. The team released a blog post providing more details. cuDF also supports 32-bit and 64-bit fixed-point Decimal data types. In addition, cuDF has expanded on the functionality provided by their GroupBy. For a detailed look into cuDF 0.19 updates, check out the changelog.
cuML
cuML now offers scikit-learn-compatible pre-processing all on GPU — this was formerly an experimental feature but has graduated to production quality. Similarly, general-purpose SHAP explainability is now ready for broad use. It accelerates the generations of prediction explanations for any cuML or scikit-learn model.
Additionally, the library added Single Linkage Hierarchical Clustering and improved the default performance and accuracy of Random Forest classification models. Additionally, new features were also added to the Logistic Regression, Approximate Nearest Neighbors, and Forest Inference Library algorithms. For a detailed look into cuML 0.19 updates, check out the changelog.
cuGraph
cuGraph now provides the Random walk algorithm and Recursive Matrix graph data generator. They also enhanced their existing graph primitives, graph partitioning scheme, muti-seed Egonet, and multi-node multi-GPU Louvain algorithm improving algorithm performance. cuGraph now provides up to 8x speedup for most traversal algorithms. For a detailed look into cuGraph 0.19 updates, check out the changelog.
RAPIDS + Dask
UCX has enhanced its code and documentation to improve performance and ease of use respectively. Dask-CUDA updated its features to provide improved handling of memory spilling. Dask-CUDA also added capabilities for log spilling and RMM logging.
CLX
CLX now provides functions to detect sensitive data, crypto mining in GFN, and analyze host workflow and provide feedback. It now also supports inference of cyBERT models on ARM. For a detailed look into CLX 0.19 updates, check out the changelog.
A Flourishing RAPIDS-Accelerated Ecosystem
Over the last two and a half years, RAPIDS has fostered a constantly growing ecosystem of GPU-accelerated libraries and tools. GTC 2021 showcased that continued growth through a wide variety of exciting RAPIDS-accelerated solutions.
One of the most exciting technical previews announced at GTC last week was Node-RAPIDS. Just like RAPIDS has accelerated the PyData and Spark communities with GPUs, Node-RAPIDS aims to extend that power to the JavaScript Node.js community. Node-RAPIDS opens up a wealth of possibilities for web developers traditionally bottlenecked by the power of a personal computer and a browser. It also makes embedding data-driven operations into web applications simpler due to shared CUDA bindings with RAPIDS data science libraries. To learn more about Node-RAPIDS, check out Allan Enemark’s GTC 2021 presentation.
RAPIDS is also expanding further into the cybersecurity realm. Due to its power, usability, and flexibility, CLX has become a major component in NVIDIA’s new cybersecurity offering, Morpheus. NVIDIA Morpheus makes the creation and operationalization of cybersecurity models easier for organizations faced with the challenging task of defending against cyber threats. Splunk is now integrating CLX, Morpheus, and Triton into their products as well. Listen to Bartley Richardson‘s talk about Morpheus to learn more.
As RAPIDS has garnered more users, more focus has been placed on making it easier and easier to use. A major push for boosting usability has been improving RAPIDS accessibility in the cloud. GTC 2021 highlights RAPIDS integrations into Cloud Machine Learning Platforms like Amazon’s SageMaker and Google Cloud AI Platform making it easier than ever to get started in the Cloud. RAPIDS 0.19 also makes it easier to deploy on native Cloud tools like Google’s Kubernetes Engine.
AWS has also integrated RAPIDS cuML and GPU-accelerated XGBoost into their open source AutoML library, Auto-Gluon. With these integrations, Auto-Gluon is able to increase performance by 25x, making highly-performant AutoML accessible to a broader audience. For more information, check out this GTC presentation from Nick Erickson at AWS.
Beyond Node-RAPIDS, we added a new library, cuCIM, for accelerating n-dimensional image processing and I/O. While the library is new, it’s shown awesome performance. It provides an accessible interface similar to scikit-image, allowing researchers and data scientists to rapidly port existing CPU-based code to the GPU. A special thanks to Quansight and the NVIDIA Clara team for collaborating to create cuCIM. Check out their recent blog posts on cuCIM:
- Quansight Blog: RAPIDS cuCIM: porting scikit-image code to the GPU
- NVIDIA DevBlog: Accelerating Scikit-Image API with cuCIM: n-Dimensional Image Processing and I/O on GPUs.
Exciting Applications of RAPIDS
The most rewarding part of GTC is seeing the creative, groundbreaking ways that users and enterprises are using RAPIDS to solve challenging problems. It was amazing to see more and more talks showing off RAPIDS usage at GTC 2021. Here are some exciting examples:
- NASA is using RAPIDS to accelerate a plethora of scientific data science use cases.
- Walmart, long-time adopters and contributors, uses RAPIDS to speed up processes across their business. Walmart is also using NVTabular to process large datasets for training neural networks.
- NVIDIA highlighted ETL, Feature Engineering, and Model Development best practices for retail forecasting. A detailed blog post and example notebook can be found here.
- Best Buy discussed how it’s using NVIDIA Morpheus to identify potential network issues.
- Volkswagen uses RAPIDS to improve connected-car data pipeline performance 100x.
To see many more unique and exciting applications of RAPIDS, check out the full GTC 2021 catalog here. Don’t worry, it’s free to register.
Accelerated Community Engagement
The RAPIDS community is growing. RAPIDS download numbers have boosted to over 100K each month in 2021. We also just hit the 10K mark for Twitter followers. To say we’re thankful for the community support is an understatement.
We’re also continuing to bolster resources for the RAPIDS community. Colab notebooks are back for both stable and nightly releases, making it easy to check out new features. Tom Drabas of BlazingSQL also whipped up a slew of RAPIDS tutorials and cheat sheets for newcomers to get better acquainted with the libraries. We’re also supporting the NVIDIA Developer Forums for users to ask their NVIDIA-centric RAPIDS questions.
Now that GTC 2021 is over, we’re turning our attention to the 2021 Dask Distributed Summit where we’re hosting a tutorial and a workshop. The Dask Summit is shaping up to be an awesome one…come check out our talks along with the other amazing presentations guaranteed to be there.
Wrap-Up
While it’s been great to see RAPIDS' success at GTC 2021, we now turn our attention to the next release. There are about three more releases between us and GTC Fall 2021, and we’re already getting excited to see what new RAPIDS updates, features, and improvements yield awesome showcases and talks.
As always, find us on GitHub, follow us on Twitter, and check out our documentation and getting started resources. We’re excited to have you join us, and we’re looking forward to another great year of RAPIDS.