RAPIDS cuGraph adds NetworkX and DiGraph Compatibility

Published in

RAPIDS AI

4 min readOct 2, 2020

RAPIDS cuGraph is happy to announce that NetworkX Graph and DiGraph objects are now valid input data types for graph algorithms. Furthermore, when a NetworkX Graph is passed in, the returned data type will match the corresponding NetworkX algorithm’s return type — with some exceptions that this blog will cover. Existing analytics using NetworkX can be accelerated by simply replacing the module name.

answer = nx.pagerank(G) ==> answer = cugraph.pagerank(G) # same G

Example Using Betweenness Centrality

Betweenness Centrality (BC) is a measure of the relative importance of a node based on the number of shortest paths that cross through the node. This is under the assumption that the more information flows through a node, the more important it is. Since a single source shortest path (SSSP) algorithm needs to be run for each node, BC can be slow. Consider the following code sample that computes BC using NetworkX and the same code with the new cuGraph feature.

It’s that easy. Replace the module name, and you have access to RAPIDS accelerated algorithms.

Running the above code on a range of random graphs produces the results in Table 1. The random graphs are created using the preferential attachment model, but other models could be used, and NetworkX contains a wide assortment of graph generators. For this test, the number of nodes started at 100 and was double each iteration. Also, the average degree, M argument, was set to 16. You can find the code in the cuGraph notebook folder.

Table 1: cuGraph runtimes for BC vs. NetworkX

Compare 47,763 seconds, which is a little over 13 hours, to the cuGraph time of 145.6 seconds, which is under 3 minutes, and you get a sense of the potential performance improvement that can be achieved by making a simple code change (a 328x speedup).

The example does use Betweenness Centrality, which is known to be slow. To improve performance, estimation techniques can be employed to use a sample of nodes rather than all of them. Setting the k argument to 25% of nodes (k = N // 4) will reduce runtime of both NetworkX and cuGraph by 75%, but also reduce accuracy.

The performance speedups listed above are typical. However, if you are working with a small dataset, like Moreno’s seventh-grade friend network or Zachary’s Karate club, where the number of nodes is less than 50, then GPU acceleration really won’t help. Once you get to a few hundred nodes, the benefits become noticeable. And once you are into the tens of thousands plus range, acceleration is a must.

The Betweenness performance numbers were generated on a randomly created preferential attachment style graph. Let’s look at a sample of other large public datasets across a range of algorithms.

Figure 1: Speedup of cuGraph over NetworkX for typical graph algorithms

Table 2: Dataset structures used in the generation of Figure 1

An additional benefit of cuGraph now accepting NetworkX Graph objects is that it also allows NetworkX user access to algorithms that are not in the current NetworkX release, for example, Louvain, Ensemble Cluster for Graphs (ECG), and Lieden, to name a few. More algorithms are coming.

Current List of Algorithms and Areas of Difference

There’s an important difference to note. Currently, cuGraph does not support a rich property set on nodes or edges. When a NetworkX graph is imported, only the source, target, and a single weight column are copied over. That means that algorithms that return a graph, like k_truss, will not have any additional attributes. The solution would be to run a networkx subgraph extraction on the returned graph.

Algorithms that exactly match, exactly match but do not copy over additional attributes, and are not available in NetworkX (Table 3).

Table 3: cuGraph algorithms that exactly match their NetworkX equivalents

Algorithms where not all arguments are supported in cuGraph (Table 4).

Table 4: cuGraph algorithms that do not support all NetworkX arguments

Algorithms where the results are different (Table 5). For example, the NetworkX traversal algorithms typically return a generator rather than a dictionary.

Table 5: Algorithms that produce different results on cuGraph vs. NetworkX

Conclusion:

RAPIDS cuGraph now provides an accelerated graph analytics library that integrates the RAPIDS ecosystem with NetworkX. Providing a friendly, easy-to-use user experience is almost as important as awesome performance. The RAPIDS cuGraph team will continue to expand the list of available algorithms, with scaling to hundreds of billions of edges, but more importantly, also focus on expanding and enhancing interoperability with NetworkX and other Python frameworks.

Your feedback and comments are greatly welcome. We are available on Google Group, or you can file a GitHub issue with suggested enhancements. Please consider giving cuGraph a star on GitHub.

RAPIDS cuGraph adds NetworkX and DiGraph Compatibility

Example Using Betweenness Centrality

Current List of Algorithms and Areas of Difference

Conclusion:

Written by Brad Rees