Girvan Newman Part 1 — Description and Components

Austin Werth
smucs
Published in
3 min readApr 30, 2022

By Austin Werth

Links to other parts: Part 2, Part 3

The Girvan Newman Algorithm

The Girvan Newman algorithm is a community detection algorithm that aims to detect and separate communities based on the connections between each node in a graph (or some other series of items with connections). This is done by iteratively removing edges between communities until each community is separated or “detected”. However, the process used to properly remove edges only between communities uses the main idea of what is known as edge betweenness centrality.

Girvan Newman algorithm separating a graph of nodes into 4 communities

Edge betweenness centrality is a measure of centrality between two nodes based on the shortest path between the nodes. In other words, if the betweenness centrality of every edge in a graph was to be calculated for every pair of nodes, then one would find that the edges connecting two communities together would have the greatest betweenness centrality. This is true because the shortest paths between nodes of two separate communities would almost always be required to traverse that one (or multiple) edge(s) that bridges the two communities together. By knowing this, it can be assumed that the edges with the greatest betweenness centrality would also be the edges that connect multiple communities together. Therefore, if all the edges with a very high between centrality were removed from a graph, it would effectively separate all of the nodes into their respective “communities”. However, when does the algorithm know when to stop removing edges in order to accurately separate the communities in the graph? The answer, modularity.

Graph showing the higher edge betweenness centrality for the edge connecting the two communities

Modularity is a measure of the strength of division of a graph/network into groups or communities. In other words, it serves as an indicator for the algorithm on whether it should continue to remove edges to separate the communities or if no more edges need to be removed since the communities are now separated. Modularity can be implemented in many different ways using an equation such as

or by checking the number of separated communities every time an edge is removed. Modularity is a very important aspect of the community detection algorithm since it ensures that the communities will be accurately separated. For instance, if the modularity value is off it could either not remove enough edges, defeating the entire point of the algorithm, or remove too many edges, isolating individual and small groups of nodes that are supposed to belong to a larger community.

The Girvan Newman algorithm community detection has several complex parts that can be implemented in a variety of ways, with some bad and some good. These possible implementation differences are some of the main reasons why there are nearly endless posts, coding libraries, etc. relating to the Girvan Newman algorithm as well as why it is still widely used today. The next part of this article will explore one possible implementation of the Girvan Newman algorithm using the Boost Graph Library for C++.

Link to other parts: Part 2, Part 3

References

[1] The Boost Graph Library — 1.78.0

[2] Girvan-Newman algorithm | NetworkX Guide

[3] https://www.researchgate.net/publication/271472127_Community_structure_in_networks_Girvan-Newman_algorithm_improvement

[4] smu-cs-3353/22s-pa03-girvan-newman-ian_maxwell: 22s-pa03-girvan-newman-ian_maxwell created by GitHub Classroom

Writer Bio

Austin Werth is a Sophomore at SMU majoring in Computer Science. He is passionate about algorithm analysis, software development, and app development.

--

--