GPU Parallelization of Community Detection using OpenCL

Utilizing GPU through OpenCL programs to parallelize Brandes’ Algorithm for betweenness centrality.

Alexander Cerpa
smucs
3 min readDec 7, 2021

--

Algorithm Pseudo-code

Graph Data Structures for OpenCL

For storing the graph information within a GPU buffer, vertices are assumed to be sequentially numbered and edges are stored within a sparse matrix to conserve memory space.

Parallelization of Dijkstra’s Algorithm

The OpenCL implementation of Dijkstra for single source shortest paths that I am using is similar to that of the example within the OpenCL Programming Guide. My implementation does not consider the weights of the edges, but keeps track of the number of shortest paths for each vertex in the sigmaArray. The information for vertex, edge, and current cost per vertex are stored in read-only arrays. The algorithm is split into two stages.

The first stage, any vertex marked for exploration within the maskArray has its neighbors explored. Initially, only the source vertex is marked for exploration within the maskArray. By exploring a vertex, its cost is updated within the updatingCostArray , which is a writable copy of the costArray.

The second stage updates costArray with any changes from updatingCostArray that results in a higher cost for that vertex. That vertex is also marked within the maskArray for exploration for the next iteration of the first stage.

Both kernels are continuously ran until maskArray does not contain a 1 at any index.

Parallelization of Edge Betweenness Centrality

Brandes’ algorithm for betweenness centrality states that when calculating the dependency of a vertex, the partial sums of the pair-wise dependencies follow a recurrence relation with each other if they are on the same path.

We start with edges farthest from the source vertex, decrementing our current depth with each iteration of the kernel. edgeArray2 provides a way to access the vertex of an edge, given an edge index. The vertices contained in this array will be the neighbors of vertices contained within edgeArray (same order of edges). For each pair of connected vertices, v and w, we calculate the partial pair dependency of vertex v and accumulate it to the dependency of v within the dependencyArray. We are using atomic addition to avoid data races between threads on the GPU.

After running the above kernel for each depth level, the centrality of each vertex is updated, ignoring the current source vertex.

--

--