Girvan-Newman v. Louvain for Community Detection

London Kasper is a second-year Computer Science student at Southern Methodist University. Her areas of concentration include Machine Learning and Artificial Intelligence, which she hopes to continue studying in the Computer Science graduate program at SMU.

Published in

smucs

4 min readApr 14, 2022

This article is a continuation of our 3 part exploration of the Girvan-Newman and Louvain community detection algorithms. Here is the link to Part 1 covering Girvan-Newman, and here is the link to Part 2 covering Louvain. You can view the GitHub for this project here.

While the Girvan-Newman algorithm is one of the most popular community detection algorithms, its complexity is impractical for large datasets. The Girvan-Newman algorithm has a computational complexity of O(m²n) for a network with n nodes and m edges, meaning that the algorithm becomes obsolete when the number of nodes exceeds a few thousand. This becomes a very prominent issue as practical community detection datasets are often based on large populations such as social media user bases, biological networks with millions of components, and world population analysis. For example, if one tries to analyze even 0.5% of Facebook’s population (a dataset including at least 14,500,000 nodes), they will far exceed the maximum number of nodes the algorithm can handle as there are bound to be multiple edges per node. This is a significant drawback of the Girvan-Newman algorithm and is the reason for its applications leaning toward smaller datasets, such as the ‘Karate Club’ dataset it was originally tested on (see our previous article here for a detailed explanation of the Girvan-Newman algorithm).

On the other hand, the Louvain community detection algorithm has a much smaller computational complexity of O(nlogn) where n is the number of nodes in the network. One might notice that the computational complexity is not dependent on the number of edges, contrary to Girvan-Newman’s complexity. This is due to Louvain’s optimized modularity-based solution. Rather than calculating edge-betweenness for each step of the algorithm (like Girvan-Newman), the Louvain algorithm instead calculates the modularity of moving each node to every other community for each step, a far less computationally taxing process. With Louvain, each node is initially individually considered as a community, and the modularity of placing it into every other neighboring community is calculated. Once every possible combination of a single node’s modularity is calculated, the node is placed into the community that resulted in the greatest positive modular increase (see our previous article here for a detailed explanation of the Louvain algorithm).

When comparing the performance of the Girvan-Newman algorithm and the Louvain algorithm, Girvan-Newman pales in comparison to Louvain’s ability to handle large datasets. Similarly, there are drawbacks to using Girvan-Newman’s modularity maximization process as it can lead to different communities being detected per run or even entire communities being left out due to an early edge split. Mini communities can go undetected under Girvan-Newman, and it is difficult to calculate a precise modularity that will lend the algorithm to an accurate conclusion.

After implementing both Girvan-Newman and the Louvain algorithm, we compared our resulting detected communities using the Football 2000 Dataset. We used this dataset because the communities are already established– we didn’t want to reinvent the wheel, we simply wanted to check our algorithm’s accuracy.

Figure 1: The result of our Girvan-Newman implementation on the 2000 Football Conference dataset.

Figure 2: Louvain’s algorithm when applied to the same Football dataset.

Figure 1, the visualization of the graph our Girvan-Newman algorithm generated, displays the 12 different football conferences as separate clusters. However, Figure 2 displays the detected communities with color-coded nodes within the original graph. The difference between the two visualizations is reflective of the way these two algorithms work. The Girvan-Newman algorithm splits communities by removing edges, essentially creating a new graph for each iteration. On the other hand, Louvain simply compares node modularity and does not actually change the original graph’s setup. As you can see by examining the visualizations, our results are the same between the two algorithms. We compared our results with the actual conference assignments and found that both algorithms correctly placed the teams in their designated conferences.

In conclusion, when it comes to using community-detection algorithms, the choice of technique should be determined by the size of the dataset. While we didn’t run into any issues with our Girvan-Newman algorithm timing out, we were also using an extremely small dataset compared to the sizes that the Louvain algorithm is capable of handling. Had we tried to analyze communities within a social network (or any other large-scale practical application), using the Louvain algorithm would have been a better option.

Girvan-Newman v. Louvain for Community Detection

London Kasper is a second-year Computer Science student at Southern Methodist University. Her areas of concentration include Machine Learning and Artificial Intelligence, which she hopes to continue studying in the Computer Science graduate program at SMU.

Written by London Kasper