Community Detection: Label Propagation vs. Girvan-Newman (Part 2)

Michael Doherty
smucs
Published in
6 min readApr 13, 2022

This article, where my partner Braiden Hook and I explore different community detection algorithms, has been split into two parts. This part quantitatively compares the Girvan-Newman and Label Propagation algorithms and explores their advantages and disadvantages. The first part explores community detection and offers a brief overview of the two algorithms. The first part can be found below:

Community Detection: Label Propagation vs. Girvan-Newman (Part 1)

Efficiency Comparison

To determine the efficiency of each algorithm, I’ll compare the runtimes of each algorithm on randomly generated graphs of different sizes. To capture the runtime, I used the chrono library in C++. I ran each algorithm on randomly generated graphs with 100 nodes and a varying number of edges. The results are shown in the following graph:

It’s a little difficult to tell what times the Label Propagation algorithm had from the above graph, so why don’t we take a closer look:

As the above graphs reveal, the Label Propagation algorithm absolutely blew the Girvan-Newman algorithm out of the water in terms of faster runtime. Even on the lowest dataset size (100 edges), the Label Propagation algorithm had a runtime that was over 10 times faster than the Girvan-Newman algorithm’s runtime (0.0365 seconds for Label Propagation compared to 0.4519 seconds for Girvan-Newman). And for the biggest dataset (750 edges), the Label Propagation algorithm had a runtime over 1950 times faster than the Girvan-Newman algorithm’s runtime (0.0431 seconds for Label Propagation compared to 84.2766 seconds for Girvan-Newman ). Ultimately, this result was expected, as the Girvan-Newman algorithm is known to have a time complexity of O(n³), while the Label Propagation algorithm is known to have a time complexity of O(n). Thus, the results of my testing reaffirm the known time complexities of each algorithm.

It is worth noting, however, that the randomly generated graphs are not based on real data (and thus the likelihood of actual communities existing within the graphs is low). Thus, I decided to test each algorithm on a real dataset to see if the results stayed the same. I tested each algorithm on the Football Conference 2000 Dataset (which consists of 114 different College Football in 12 different conferences (or communities)). After running the dataset through each algorithm five times, I found the following average runtimes:

  • Girvan-Newman Algorithm: 27.14 seconds
  • Label Propagation Algorithm: 0.0871 seconds

Thus, after testing both algorithms on various graphs, it’s readily apparent that the Label Propagation algorithm is far more efficient than the Girvan-Newman Algorithm.

Consistency Comparison

One major aspect that differs between the two algorithms is their consistency. The Label Propagation algorithm runs the nodes of the given graph through Label Propagation in a random order; because of this, it can generate different communities every time it runs. Thus, I find it imperative to compare the consistency of each algorithm.

After running the Football Conference 2000 Dataset through each algorithm three times, I discovered that the Girvan-Newman algorithm generated the same result every time. On the other hand, the Label Propagation algorithm generated a different set of communities each time. Across the three different outputs, the first and second outputs generated generated 11 communities, while the third output generated 10 communities. It is worth noting that the three outputs from the Label Propagation algorithm did have 5 communities that were identical across all three outputs.

Thus, after testing both algorithms, it’s clear that the Girvan-Newman algorithm is easily more consistent than the Label Propagation algorithm.

Accuracy Comparison

To test the accuracy of each algorithm, I’ll compare their generated communities to the actual communities that exist within the Football Conference 2000 Dataset. As mentioned above, the Label Propagation algorithm generates different communities each time it is run; thus, I’ll calculate its average accuracy across the three datasets I used for the consistency comparison.

My definition of an accurate community is as follows: for each community generated, I deem that community as the conference that most of its members are a part of. For example, if one of the generated communities contained Arkansas, LSU, Ole Miss, Mississippi State, Alabama, Auburn, Kentucky, Georgia, South Carolina, Florida, Vanderbilt, Tennessee, UCF, and Florida State, I would deem that community as being the SEC (and thus 12 of the 14 teams would have been accurately placed, as UCF and Florida State were not in the SEC in 2000). However, a conference can only be assigned to one community; for example, if Community A consists of Iowa, Penn State, Northwestern, Wisconsin, and Michigan while Community B consists of Purdue, Ohio State, Minnesota, Illinois, Michigan State, and Indiana, I would label Community B as the Big Ten (as it has 6 Big Ten members), while Community A would not receive a label (and thus the algorithm would have only correctly identified 6 members from the Big Ten). I decided to do it this way since each conference is a single community and splitting a conference into two communities means the algorithm misidentified one of the communities as being a separate conference.

With this definition of accuracy in mind, the Girvan-Newman algorithm placed 105 out of 114 teams in their correct conference/community. That correlates to a 92% accuracy when determining which conference/community a team belongs to. On the other hand, the Label Propagation algorithm correctly placed 94, 94, and 96 teams in their correct conference/community (across its three different outputs). This averages to correctly placing 94.67 out of 114 teams in their correct conference/community, which correlates to an 83% accuracy.

Thus, as the data suggests, the Girvan-Newman algorithm is more accurate than the Label Propagation algorithm when determining what communities each node of a graph belongs to. It’s also worth noting that the Girvan-Newman algorithm generated the correct number of communities/conferences that existed within the graph (12), while the Label Propagation algorithm generated fewer communities/conferences (10 or 11, depending on the output).

The communities the Girvan-Newman algorithm and Label Propagation algorithm generated can be seen in the graphs below:

Graphed output of Girvan-Newman. Each different color represents a different community/conference.
One of the graphed outputs of Label Propagation. Each different color represents a different community/conference.

Conclusion

Thus, after comparing and contrasting the Girvan-Newman algorithm and Label Propagation algorithm, their strengths and weaknesses have become readily apparent. The Girvan-Newman algorithm is more consistent and accurate at generating communities, but at the cost of a slow runtime (especially for large graphs). On the other hand, the Label Propagation algorithm is extremely time efficient, but at the cost of being inconsistent and less accurate with the communities it generates. Nevertheless, both algorithms have proved they can detect communities within a given graph, verifying their validity as Community Detection algorithms.

References

[1] Raghavan, Usha Nandini, Réka Albert, and Soundar Kumara. “Near linear time algorithm to detect community structures in large-scale networks.” https://arxiv.org/pdf/0709.2938.pdf

[2] “Network data.” http://www-personal.umich.edu/~mejn/netdata/

Author

Hello there! My name is Michael Doherty, and I’m a sophomore majoring in Computer Science at Southern Methodist University. My current interests include software engineering and video game development.

--

--