Using Girvan-Newman to Identify Communities — Part 1

Kate Bouis
smucs
Published in
3 min readApr 11, 2022

These past few weeks, I worked with two others to code community detection in C++ using the Boost Graph Library (BGL). The two other team members wrote articles that extend my discussion of the project, and these articles are linked below. This is Part 1 of our project explanation, which essentially explains how we implemented our project, including the basics of the Girvan Newman algorithm and how to read and write graphs using BGL.

We used the Girvan Newman algorithm to detect communities in graph networks. By running an algorithm on a data set of college football teams in 2000 and their games played, linked here, we separated them by conference. Below is an example of one of the clusters of our output. We tested output by both conference number, which was coded into the graphml file, and then again by college name, to give real world meaning to our findings. Here is the same cluster in the two different formats, for example:

One cluster showing the connections between three different football conferences, using numbers to show how they’re grouped
Output by Conference Number
One cluster showing the connections between three different football conferences, using team names to give real life meaning to our findings
Output by College Name

We used some of the BGL built in functions to read in graphml files and write out graphviz files. The datasets we used were originally in gml format, and converting from gml to graphml was simple using the networkx python library. By writing our files out in graphviz format, we allowed our results to be visualized as graph pictures. After using graphviz to write out to a .dot file, we pasted the code here, which turned the .dot file into the graphs you see above.

While reading and writing graphs using BGL is necessary to the project, our central focus was on the Girvan Newman algorithm and implementing it so that we can detect communities in very large graph datasets. The algorithm consists of two main parts — calculating edge betweenness to determine which edges to remove, and calculating modularity to determine when the strongest communities have been detected and hence when to stop removing edges. I am going to discuss the first part in my article, with the latter being covered in Part 2 linked below.

The Girvan-Newman algorithm essentially works by removing one edge at a time from a graph until the graph is separated into distinct communities. The shortest path between any two nodes is calculated and stored. Then the algorithm removes the edges by using edge-betweenness centrality, which is how many shortest paths pass through one particular edge. If an edge has a relatively high number of shortest paths passing through it, that edge is removed, as it is considered to connect multiple communities rather than existing within one community. After the edge is removed, these steps are repeated to remove more edges until the communities have been distinctly separated from one another.

In our specific implementation of the Girvan Newman algorithm, we utilized breadth-first search (BFS) to calculate betweenness centrality. The BGL has a built in BFS function that we called, using as parameters our graph, a vertex iterator, and a custom visitor object. As we iterated through the graph to calculate edge betweenness after every edge removal, we calculated the modularity of the new graph, compared this modularity to the maximum modularity, and if our new modularity exceeded the maximum, we stored the new modularity as well as the new graph.

To read how we calculated modularity in order to identify the community structures, read Part 2, linked here.

Here is Part 3 of our project, which discusses an extension where we alter the Girvan Newman search method slightly to produce nearly identical results with a 30% time reduction.

Thank you for reading Part 1, which detailed our methods for reading, writing, and visualizing graphs, as well as explained the first few steps in implementing the Girvan Newman algorithm for community detection.

This article was written by Kate Bouis on April 11, 2022. Kate is currently a sophomore at Southern Methodist University studying Computer Science and Mathematics. This is her first publication, and in the future she aims to do research in the field of AI/Machine Learning.

--

--