Improving Efficiency of Girvan-Newman Algorithm

Ariyan Kumaraswamy
smucs
Published in
3 min readNov 17, 2021

Authors: Ariyan Kumaraswamy, Kyle Doran

When searching for ways to improve the Girvan-Newman algorithm, we came across a paper called “Revising the Newman-Girvan Algorithm” written by Jana Coronicová Hurajová and Tomáš Madaras [1]. In this paper, they discuss various methods for improving the efficiency of the algorithm.

One of the things they mention is how the Girvan-Newman algorithm only removes a single edge at a time before recalculating the edge betweenness for all edges in the graph. They also point out that the algorithm doesn’t account for the situation where two or more edges share the maximum betweenness value. To address this they decide to calculate ‘group edge betweenness’, where a set of edges will have one betweenness value. From their solution we had the idea to remove all edges that shared the maximum betweenness value simultaneously to determine if it would improve the runtime efficiency of the algorithm.

To test our implementation of this idea, we made use of the 2002 American College football data set [2] and a randomly-generated data set. For the college football data set our runtimes and community detection output were nearly identical to our initial implementation of the Girvan-Newman algorithm. This was because there were no instances where two edges shared the maximum betweenness value. For the randomly-generated data set, we noticed slight improvements in runtime efficiency when removing all edges that shared the maximum betweenness value. The community detection output was also similar.

Because the runtime hadn’t improved at all for the college football data set and only slightly for the randomly-generated data set, we modified the algorithm to remove the 5 edges with the highest betweenness values before recalculating. This dramatically improve the runtime efficiency of the algorithm while maintaining similar community detection output. We compiled timing data for 3 edge removal methods: removing a single edge with the highest betweenness value, removing all edges that share the maximum betweenness value, and removing the 5 edges with the highest values. The timing results for removing the first 160 edges are listed below:

The community graph outputs for the standard algorithm and its modifications are below:

Football dataset output (removeOne)
Football dataset output (removeMaxSharedValues)
Football dataset output (remove5Highest)
Random dataset output (removeOne)
Random dataset output (removeMaxSharedValues)
Random dataset output (remove5Highest)

Works Cited:

[1]J. C. Hurajová en T. Madaras, “Revising the Newman-Girvan Algorithm”, in Proceedings of the 16th ITAT Conference Information Technologies — Applications and Theory, Tatranské Matliare, Slovakia, September 15–19, 2016, 2016, vol 1649, bll 200–205.

[2]M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002).

--

--