RAPIDS release blog 22.06

Sophie Watson
RAPIDS AI
Published in
8 min readJun 30, 2022

RAPIDS 22.06 brings new support for massive graphs, and expands the support for Multi-Node Multi-GPU algorithms.

We’re excited to announce the release of RAPIDS 22.06. Whether working on a single or multi GPU environment, this release brings you new functionality and more ways to accelerate your Data Science workloads.

Major updates from the RAPIDS 22.06 release include:

  • The release of a new RAPIDS Graph-as-a-Service library
  • The addition of Multi-Node Multi-GPU (MNMG) algorithms across many libraries, including cuML and cuGraph
  • Support for interactive debugging and profiling of memory usage with RMM

There have also been big updates to the cuSpatial and cuCIM libraries, and the RAPIDS team continues to make many behind-the-scenes changes to optimize code, and set us up for success as we continue to expand the functionality of RAPIDS.

Release of Graph-as-a-Service

There are numerous cases where sharing graph processing and analytics on the same GPU can be problematic, especially as the size of the graph grows and uses more of the limited GPU memory space. For instance, consider training a Graph Neural Network, using DGL or PyG, where the graph is 10s, or 100s, of billions of edges and requires 32 or more GPUs for processing. Having the graph and training processes share GPUs would result in resource contention. There are also situations where there is a single graph and multiple distributed analytic clients. Consider a data warehouse where the graph represents historical data, housed on a thousand GPUs, and a collection of analysts each exploring a different portion of the graph but the analyst just needs one single GPUs desktops. Graph-as-a-Service, or GaaS, is our solution to the problem.

RAPIDS GaaS is a lightweight wrapper around RAPIDS cuGraph that provides access to graph functionality via an RPC API. This allows graph processing to be on separate hardware from analysis. GaaS uses cuGraph, cuDF, and other libraries on the server to execute graph data prep and analysis on server-side GPUs. Multiple clients can connect to the server allowing different users and processes the ability to access large graph data that may not otherwise be possible using the client resources.

Support for scaling out

When working with massive amounts of data, it’s not uncommon to run out of room on a single GPU. We continue to expand support for Multi-Node Multi-GPU (MNMG) workloads across RAPIDS libraries.

MNMG Logistic Regression in cuML

This release, cuML adds experimental support for multi-node Logistic Regression, based on Dask-GLM, and Dask’s CuPy array support. This first release focuses on adding support for processing data much larger that single GPU.

Add the required components to a RAPIDS environment can be done by running the following commands:

mamba install -c conda-forge sckit-learnpip install "sparse>=0.7.0" "multipledispatch>=0.4.9" --no-depspip install "git+https://github.com/dask/dask-glm@main" --force-reinstall --no-deps

Then the code has the familiar, Scikit-learn based API of regular Logistic Regression:

from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster(threads_per_worker=1)
client = Client(cluster)
n_total_partitions = len(list(client.has_what().keys()))
from cuml.dask.extended.linear_model.logistic_regression import LogisticRegression
from cuml.dask.datasets import make_classification
distX, distY = make_classification(1000000,
10, n_parts = n_total_partitions)
logreg = LogisticRegression(fit_intercept=True, max_iter=50)logreg.fit(distX, distY)

Future versions will remove the optional component steps, and focus on optimizing memory consumption and runtime.

More Multi Node Multi GPU algorithms in cuGraph

In RAPIDS 22.04, the cuGraph team released the first implementation of a Property Graph, allowing you to model diverse networks and keep track of metadata and dependencies between nodes. With the 22.06 release, you can now use the Property Graph on MNMG architectures, supporting your use case modeling large graph networks. The code snippet below shows you how to create a property graph, add and select vertices and edges, and extract a subgraph, using cuGraph.

pG = cugraph.experimental.PropertyGraph()pG.add_vertex_data(customers_df,
type_name = "customers",
vertex_col_name="Cust_ID")
pG.add_edge_data(purchases_df,
type_name="purchases",
vertex_col_names=("Cust_ID", "Store_num"))
selection = pG.select_vertices(f"{pG.type_col_name==}=='stores'")
selection += pG.select_edges(f"{pG.type_col_name}=='merch_xfers'")
G = pG.extract_subgraph(selection=selection,
edge_weight_property="Amount")

And the MNMG support in cuGraph doesn’t stop there — you can also use both the Eigenvector centrality algorithm, and the Triangle counting algorithm on data located across multiple GPUs.

Seamless support for Dask DataFrames in cuXfilter

The RAPIDS ecosystem continues to expand support and integration with Dask, allowing you to simply scale across cores and machines, when working with large datasets. As of RAPIDS 22.06, cuXfilter now more seamlessly integrates with Dask, allowing you to use dask_cudf.DataFrame as a direct replacement for cudf.DataFrame in your large cuXfilter visualisations.

Alt Text: Visualisation of a graph with 300M nodes and 10M edges. The video shows an area of the graph being selected and zoomed in on, in realtime in a Jupyter Notebook.
Figure 1. A Single GPU cuXfilter visualisation of a graph with 300M nodes and 10M edges, using Dask integration.

Figure 1. Shows a cuXfilter visualisation of a plot with 300M nodes and 10M edges. The dask_cudf.DataFrame will work for larger, more complex graphs and scatterplots.

There have also been a bunch of updates made to the cuXfilter example notebooks, so if you want to explore the capabilities of the library, and try out the new dask_cudf compatibility, check out these notebooks and try it out for yourself!

Improved efficiency (and how to monitor it!)

Across all RAPIDS libraries we continue to make your code run faster, and it’s now possible to interactively profile your code to gain more insight into how memory is being used.

Efficient complex DataFrame expressions with cudf.DataFrame.eval

If you’ve ever evaluated complex expressions on large pandas DataFrames, you may have made use of DataFrame.eval to accelerate those calculations. You can now reap those benefits with cuDF as well. cudf.DataFrame.eval never allocates memory for intermediates, which makes it both faster and more memory efficient than the naive alternative. For example, df.eval('a+b+c') never creates the column df['a']+df['b'] that would result from writing out df['a']+df['b']+df['c']. For complex expressions, the performance gains can be significant, and the reduced memory usage can be critical in resource-constrained workflows.

Interactive debugging and profiling with RAPIDS Memory Manager

Memory allocations in RAPIDS are performed by the RAPIDS Memory Manager (RMM). In the 22.06 release of RMM, users can now trigger a callback function when RAPIDS libraries allocate or free memory for interactive debugging, profiling, and a wide range of other use cases.

This is supported through a new class called a callback memory resource and can be used with any other memory resource and a pair of callback functions that are triggered when GPU memory is allocated or deallocated. The callback memory resource can be constructed in C++ or Python, and assigned as the current device’s memory resource. Note that the overhead of the allocate/deallocate callback functions is significant, and should only be used for debugging memory usage or when performance is not crucial.

The allocate function accepts a size and returns a pointer to the allocated memory. The deallocate function accepts a pointer and a size and returns None (or void in C++).

Below, we show how the callback can use Python’s logging module:

import rmm
import logging
# Log at INFO levellogging.basicConfig(
format='%(levelname)s:%(message)s',level=logging.INFO)
# Using a CudaMemoryResource as the backing MR,
# define allocation and deallocation functions that
# print the amount of memory being (de)allocated.
base_mr = rmm.mr.CudaMemoryResource()def allocate(size):
logging.info(f"Allocating {size} bytes")
return base_mr.allocate(size)
def deallocate(ptr, size):
logging.info(f"Deallocating {size} bytes")
base_mr.deallocate(ptr, size)
# Create a CallbackMemoryResource and set it to be
# the default memory resource used by RMM:
mr = rmm.mr.CallbackMemoryResource(allocate, deallocate)
rmm.mr.set_current_device_resource(mr)

After setting the current device resource, all RAPIDS memory (de)allocations will trigger callbacks. For example, creating a cuDF DataFrame will show how much memory is allocated:

>>> import cudf
>>> s = cudf.Series([0, 1, 2])
INFO:Allocating 24 bytes
>>> del s
INFO:Deallocating 24 bytes

New functionality across the libraries

There have been lots of new additions to the RAPIDS libraries in this release, and below we highlight some from cuDF, cuSpatial and cuCIM.

cuDF

We continue to expand DataFrame functionality. With this release you can now use the DataFrame.applymap method to apply an elementwise function to a DataFrame. The method maps from a scalar DataFrame element to another scalar DataFrame element and is able to handle nulls within the DataFrame, as shown in the example below:

>>> df = cudf.DataFrame({"a":[0.01, None, 1, 10],
"b":[0.12, 1.2, 12.3, None]})
>>> df
a b
0 0.01 0.12
1 <NA> 1.2
2 1.0 12.3
3 10.0 <NA>
>>> df.applymap(lambda x: 42 if x is cudf.NA else x-1)
>>> df
a b
0 -0.99 -0.88
1 42.00 0.20
2 0.00 11.30
3 9.00 42.00

cuSpatial

Vast amounts of location data is recorded every second from a wide range of sensors, including from mobile phones, vehicles and cameras. Traditionally, processing these complete datasets was not possible, due to their size and the computational complexity required to transform the data.

The cuSpatial library provides a suite of functionality to accelerate the common operations needed to process and understand geographic information series (GIS) data from sensors. And, with RAPIDS 22.06, cuSpatial introduces a new feature called pairwise_linestring_distance for computing the shortest distances between linestrings.

Figure 2 shows that the performance of pairwise_linestring_distance decreases as the linestrings become more complex.

The performance of pairwise linesting distance generally decreases as the number of linestrings increases, and increases as the number of segments perr string increases.
Figure 2. The performance of pairwise_linestring_distance shown over datasets with varied number of linestrings and segments per string.

We compared the performance of pairwise_linestring_distance in cuSpatial to the equivalent function in Shapely on two realistic datasets: the Transport Dataset of California and TrajAir, a General Aviation Trajectory Dataset. As shown in Table 1, we saw significant speedup on both datasets, compared to using Shapely on the CPU, across these varied datasets.

Table 1. Performance of pairwise_linestring_distance on the Transport Dataset of California and the TrajAir Datasets, compared to the performance in Shapely.

cuCIM

cuCIM, the library for accelerated n-dimensional image processing and image I/O, has also seen some exciting new features added for this release.

First, the cuCIM release adds two new functions for stain extraction and normalization of digital pathology slides stained with Hematoxylin and eosin: cucim.core.operations.color.stain_extraction_pca and cicim.core.operations.color.normalize_colors_pca.

Figure 3. Result of using the clear_border function to go from an image which contains blobs that overlaps the boundaries of the border (top left) to the same image, but with all the blobs overlapping the border removed (bottom right).

There is also a new function, cicim.skimage.segmentation.clear_border, that can be used to remove any labels touching the image/volume border. In the following code we use this clear_border functionality to remove labels at the boundary of an image. The resultant images are shown in Figure 3.

import cupy as cp
import matplotlib.pyplot as plt
import numpy as np
from cucim.skimage import (color, data, measure, segmentation)# generate synthetic data (binary blobs)
blobs = data.binary_blobs(1024, n_dim=2,
blob_size_fraction=0.04, volume_fraction=0.35,
seed=5)
# Assign unique labels to each blob
labels = measure.label(blobs)
# discard any blobs that are touching the border
labels2 = segmentation.clear_border(labels)
# assign randomized RGB colors to each label
labels2_rgb = color.label2rgb(labels2)
# Determine the area in pixels of each label
properties = measure.regionprops_table(labels2, properties=['area'])
areas = properties['area']
print(f"areas = {areas}")
# Visualize the result with Matplotlib
fig, axes = plt.subplots(2, 2, figsize=(8, 8))
fontdict = dict(fontweight='bold', fontsize=16)
axes[0][0].imshow(cp.asnumpy(blobs), cmap=plt.cm.gray)
axes[0][0].set_title("synthetic blobs", fontdict=fontdict)
axes[0][1].imshow(cp.asnumpy(labels), cmap=plt.cm.gray)
axes[0][1].set_title("labeled blobs", fontdict=fontdict)
axes[1][0].imshow(cp.asnumpy(labels2), cmap=plt.cm.gray)
axes[1][0].set_title("trimmed blobs", fontdict=fontdict)
axes[1][1].imshow(cp.asnumpy(labels2_rgb))
axes[1][1].set_title("trimmed blobs (RGB)", fontdict=fontdict)
for ax in axes.ravel():
ax.set_axis_off()
plt.tight_layout()
plt.show()

In addition, some of the existing cuCIM functions have been updated for performance improvements. Specifically, edge detection with cucim.skimage.feature.canny should be 3–4x faster than the previous release. Binary and grayscale morphological operations can now be performed much faster for large footprint sizes.

Conclusion

On top of the updates discussed here, we’ve been working on additional improvements to bring you updated library versions, the newest versions of dependent libraries, better user experience and more functionality and reliability across all the RAPIDS libraries. Be sure to check out the release notes for all the details.

We’re looking forward to hearing from you on how you are using these new capabilities of RAPIDS. As always, reach us on GitHub, follow us on Twitter, and check out our documentation and getting started resources.

--

--