Girvan-Newman and Louvain Algorithms for Community Detection

Cullen Watson
smucs
Published in
5 min readApr 25, 2022

An in-depth introduction of community detection algorithms with Cullen Watson

The Girvan-Newman algorithm is the most widely used algorithm to evaluate the quality of a partition of a network. However, since its inception in 2002, there have been many algorithms that have been devised to improve its performance or to fit certain use cases. In this article, we take a look at both the Girvan-Newman algorithm and the Louvain algorithm.

Basics of the Girvan-Newman Algorithm

Let’s first understand the basics of the Girvan-Newman algorithm and understand why it is so computationally expensive.

Edge Betweenness

It uses edge betweenness to find and remove central edges that connect communities within a larger graph.

The formula for edge betweenness is [1]:

where σᵤ,ᵥ is the number of shortest paths between two distinct vertices and σᵤ,ᵥ(e) is the corresponding number of shortest paths containing a particular edge.

Modularity

After removing an edge, the Girvan-Newman algorithm calculates the modularity (Q) of the graph, which is a value between the range [-0.5,1]. A higher value suggests a more significant community structure. Therefore, we can identify communities by maximizing modularity.

The formula for modularity is [2]:

(m = number of modules, ls = the number of edges inside module s, L = the number of edges in the network, d = total degree of the nodes in module s)

This process of removing an edge and calculating the modularity is iteratively repeated. The algorithm will stop when the new modularity is no longer greater than the modularity from the previous iteration. The ending modularity is usually around 0.6.

Pseudo Code

The high-level code for the algorithm becomes:

Algorithm Caveat

One caveat to this algorithm is that it is difficult to find smaller communities. As noted in [2], due to the modularity optimization, the algorithm fails to detect “modules smaller than a scale which depends on the total size of the network and on the degree of interconnectedness of the modules.”

As a result, one should use this algorithm for detecting larger community structures and then examine the detected communities for sub communities.

Time Complexity

Despite Girvan-Newman’s popularity and quality of community detection, it has a high time complexity [3], increasing up to O(m²n) on a sparse graph having m edges and n nodes. As a result, Girvan-Newman is generally not used on large scale networks. Its optimal node count is a few thousand nodes or less.

Because of this, there exist greedy algorithms for detecting communities to reduce the time but at the same time sacrificing the most accurate results. One such example is the Louvain algorithm.

Basics of the Louvain Algorithm

The Louvain algorithm is a fast implementation of community detection. It can be used to analyze a network of 2 million nodes in only 2 minutes [4].

It is a hierarchical clustering algorithm that involves two phases: modularity optimization and community aggregation [5]:

Modularity Optimization

As seen in the figure, the first step is to optimize the modularity of the entire graph. In this example, it splits the nodes into four communities.

To find these clusters, each node is moved into its neighboring community. If the change in modularity (ΔQ) is greater than 0, it is moved into the neighboring community, Otherwise, it remains in its current community. This process is repeated until ΔQ=0 for all nodes.

Community Aggregation

After modularity optimization, super nodes are created to represent each cluster. After the initial phase of the algorithm, there will exist many communities.

However, the two phases repeat, creating larger and larger communities. The algorithm stops only when no improvement can be made by any of the two operations [6].

Time Complexity

The Louvain algorithm is an improvement of the Girvan-Newman algorithm in terms of time complexity with a run time of O(n log n). This is currently the fastest complexity in community detection.

However, it does not guarantee that the communities found are the best possible. Moreover, research is still ongoing in community detection, and there is no “best” algorithm that exists.

Comparing the Algorithms on Datasets

After running both algorithms through 5 relatively small datasets from [6], I noticed that Louvain was much quicker than Girvan-Newman, confirming its time complexity of O(n log n). However, it was not feasible to continue comparing the algorithms after I reached a few thousand nodes due to Girvan-Newman’s expensive run time.

In terms of the clustering, both algorithms detected nearly the same number of communities except for the word adjacency data set, which would confirm that words have no distinct correlation in usage.

However, the similar number of clusters in the other datasets suggests a correlation between each module. For example, each module in the Les Misérables dataset suggests that are characters are related, or in the Football Conference 2000 Dataset, each cluster represents a Division I-A Conference.

Further Reading & References

[1] Revising the Girvan-Newman algorithm

[2] Resolution limit in community detection

[3] Comparing Algorithms of Community Structure in Networks

[4] Louvain method: Finding communities in large networks

[5] A Generalized and Adaptive Method for Community Detection

[6] Network Data

--

--