RAPIDS 0.8: Same Community New Freedoms

Published in

RAPIDS AI

7 min readJul 19, 2019

RAPIDS released 0.8 a few weeks back. And afterwards, like most Americans, we took off for the 4th of July holiday. Over that break, I reflected on the purpose of RAPIDS. Speed is great, building a strong community is awesome, but the true power of RAPIDS is in the enablement of everyone to do more than ever before, with less. RAPIDS allows us to imagine what’s possible, freeing us from the compute constraints of the past. As Bill Groves, CDO of Walmart, said in a fireside chat with me last week, “Math and science hasn’t changed in 30 to 50 years. What has changed is the technology that enables it, so that we have opportunities that we didn’t have until recently.” Essentially, we’re affording people the freedom to reimagine what’s possible, to test new ideas, to tackle new challenges in their pursuit of happiness.

Balancing Speed

There are two speeds every developer is keenly aware of — the speed of execution, and the speed of writing code. RAPIDS has found a way to balance both. Fast computation is great, but if it takes weeks to develop and weeks to modify the application you’re building, then even if it is slower at run time, the more user-friendly choice makes sense. Conversely, the easiest to use code in the world is of very little use if the compute time takes weeks; the required hardware to solve a problem becomes prohibitively expensive. This was the challenge prior to RAPIDS. GPUs were great, but inaccessible by many. Traditional CPU software was easier to use, but challenged by the limits of scale-out.

RAPIDS is the next step in the evolution of data science. GPU acceleration gives you speed, and traditional PyData APIs give us the ease of use and development to put code in production faster.

More Tools and More Interoperability with cuML and PyTorch

cuML added support for Logistic Regression, building on our new quasi-Newton solver methods. Combined with the existing ElasticNet, Ridge, Lasso, and OLS models, these add up to give cuML broad coverage of the most common linear models.

cuML 0.8 includes the first version of our new Random Forest module. (See the example notebook here.) This is still in early preview stages and has known limitations (for example, it only supports classification and does not provide the full performance of our upcoming version), but it is a great chance for users to try out the upcoming APIs and see the future of GPU support for tree-based methods.

The cuML experience is more than just fast algorithms, so we are continuously improving the core user experience and interfaces. cuML started rolling out support for the new input_utils module to ensure that all models provide consistent, transparent support for cuDF, numpy, and any Python array that supports the __cuda_array_interface__ standard (e.g. cuPY arrays). We followed up on a popular user request and added support for pickling to almost all models and are continuing to expand its support for the 0.9 release.

Finally, RAPIDS cuDF isn’t only for batch processing, stream analytics, and preparing data for cuML, XGBoost, and cuGraph; it’s also great for Deep Learning. RAPIDS shows an 8.5x speedup in preprocessing of tabular data to get it ready for a deep learning model by doing categorical encoding, median normalization, and null value filling on the GPU using RAPIDS instead of Pandas. This is done by using cuIO to read the data in from Parquet, preprocessing with cuDF, using DLPack to convert to a Pytorch tensor with zero-copy, and then copy the tensor back to CPU for the data loading. I know this is a mouthful, but the goal is to continue to simplify this while increasing end to end performance. That last step is actually optional, and if you can fit the entire dataset in GPU memory the speedup is 9.6x. There will be a longer blog on this in the coming weeks and even more functionality in the near future.

Tackle New Challenges with cuDF and Dask

We work closely with you, the data science community, when we decide what we’re going to build next. That’s why we’re excited to tell you about everything we’ve added to cuDF.

As a former economist, I’m excited to announce we have Series.rolling() and DataFrame.rolling() methods, making windowed aggregations with RAPIDS very simple. We’ve added Series.where() to make conditionally replacing variables a one-liner. We’ve also added in more classic Pandas functionality like Series.loc() and a full set of element-wise, arithmetic operators. We also made huge performance improvements to MultiIndex related operations such as .loc() and .iloc() as well as optimizations and improved compatibility in joins and groupbys.

We are proud to announce that a Java API has been contributed to cuDF. The Java API uses the same GPU-accelerated implementations in libcudf under the hood as the Python API and is the start of bringing the RAPIDS ecosystem to the JVM.

You may not have noticed, but there’s a lot of talk about the time and effort spent in building Pandas API compatibility for cuDF. The obvious reason behind this is that users are comfortable and efficient in working with the Pandas API. The not-as-obvious reason behind this is that it allows RAPIDS to build on top of the existing ecosystem to provide extended functionality like distributed out-of-core execution using Dask or stream processing using Streamz.

Currently, we have a small Dask-cuDF library for functionality where we can’t follow the same code path that Dask uses for Pandas. As we continue to add functionality to cuDF and continue to make our API more and more compatible with Pandas, the Dask-cuDF codebase will be whittled down to mostly just unit tests to ensure things work as expected with cuDF. We are currently doing the same thing with the Streamz library, and it’s showing extremely promising results. There will be more information on streaming analytics with RAPIDS in a few weeks.

Improving Connections with cuGraph

For the 0.8 release, cuGraph continued to expand the list of available algorithms, but also took a step back to evaluate the code base, APIs, and overall user experience. The initial goal of cuGraph was to simply get algorithms out as fast as possible, and we have been relatively successful at that. However, with release 0.8 and moving forwards, the goal is shifting to making the graph experience more seamless with the rest of RAPIDS, more consistent across analytics, and more accessible to users wanting to create custom analytics. For example, the SSSP and BFS analytics now return a DataFrame with the same column names. That simple change makes post-processing of the data more consistent.

For release 0.8, cuGraph includes two new single-GPU analytics: Weakly Connected Components (Strongly connected coming in 0.9) and Personal PageRank. Additional utility features: computing degree (in, out, and total) and Renumbering. And an update to SSSP so that it now returns the predecessor so that paths can be recreated,

A lot of the improvements in 0.8 are things that are not obvious, unless you like looking at code. The codebase, both the Python and C folders, has been restructured to better organize files based on analytic types; the Python APIs have been updated to better emulate the experience users have with NetworkX, and all reported bugs have been addressed.

While many of the 0.8 cuGraph improvements are subtle, they’ve made a major impact on people who use them. Here are some examples from our good friend and contributor John Murray.

What Coming in RAPIDS 0.9

We’ve got more great stuff coming in the 0.9 release. On the cuDF side, we’re adding support for the NEP 18 __array_function__ protocol for more ecosystem interoperability, and we’re hoping to have an accelerated Avro reader. Of course, we’re also pushing toward even more Pandas API compatibility, and doing under-the-hood refactors and optimizations.

For cuML, we’re continuing to improve the experimental Random Forest algorithm we previewed in v0.8 — adding support for regression models and increased performance. And we’re excited to be launching our very first multi-node (MN), multi-GPU (MG) algorithms, both K-means and Random Forest. cuML will add a new module for GPU inferencing for Random Forest and Gradient Boosted Decision Tree models as well. This will allow users to manipulate, train, and inference data with the two most popular tree algorithms, end to end, for the first time.

For cuGraph, we will see the first MG graph analytic with the release of MG PageRank. To accomplish that, a number of subfeatures will also be released, like MG COO-to-CSR conversion and MG degree computation. As mentioned earlier, release v0.9 will also include Strongly Connected Components. Release v0.9 will also see the expansion of cuGraph features and programmability with the integration of the dynamic graph data structure Hornet from Georgia Tech. And lastly, we are planning on expanding the number of primitives and utility functions to make interacting with graph data and results easier.

Liberty, Freedom, and Happiness

“Open source software is free” can make people’s minds go to dollars and cents. When I think of that phrase, it means much more to me. OSS means you’re free to choose — do you want to work with files in Orc or Parquet? You’re free to impact the course of the project — do you want to contribute? You’re free (and encouraged) to check out a good first issue and start contributing. And with OSS you’re free to experiment, whether with the new, experimental version of Random Forests, available in cuML 0.8, or your own creations using Numba.

At RAPIDS, one of the things that makes us happy is the GPU data science community and all of our partners and friends in it. We want interoperability and to enable an ecosystem, not to build walls.

We want to make more friends and build more bridges to other ecosystems. It’s easier than ever to get started with RAPIDS. In fact, at the push of a button, you can start a notebook in Google Colab and try out all the new things we’ve added. Join in! Help us build the next platform for computing.