Finding the Time Complexity of Girvan-Newman: Part 1

Wes Anderson
smucs
Published in
3 min readApr 13, 2022

Authors — Wes Anderson and Ryan Schaefer

Note — This article is the first of a two part article series. After reading this please go read the article written by Ryan Schaefer

URL = URL->next;

Data Generation

For project 3 in Algorithms (CS 3353), we had to implement Girvan-Newman’s Community detection algorithm. The premise of this algorithm is taking one big connected graph and breaking it into smaller disconnected graphs or “communities.” After we were done tweaking our implementation, we came up with the extension idea of predicting the time complexity of our implementation. This algorithm is much more complex than sorting algorithms so we had to take multiple factors into consideration. Is the algorithm dependent on edges, nodes, edges per node, or some other factor entirely?

To test these multiple theories about the time complexity of our implementation, we created 50 randomly generated graphs. These graphs were generated using gnm_random_graph, a function in the Networkx Python library and each had a set number of vertices and edges ranging from 100–1000 vertices and 500–5000 edges. We ran each graph through the algorithm and recorded the times in a csv file. Once the required data had been generated, the outputted csv was imported into R to begin our statistical analysis.

Early Estimates

The first step of our analysis was to generate 2 graphs, comparing the number of vertices and edges respectively to the time. For both the graphs we averaged the times for all the vertices and edges to single out the variable being plotted. Both Figures 1 and 2 show that there is a clear positive correlation between the number of vertices and the number edges respectively and the time of the algorithm. This leads us to believe that both the number of vertices and the number of edges have an influence on the time of the Girvan-Newman algorithm.

Although both Figure 1 and Figure 2 show a correlation between the number of vertices/edges and time, these graphs are not good enough to accurately estimate the time complexity of the Girvan-Newman algorithm. Figures 3 and 4 are scatter plots showing every data point comparing the number of vertices and edges respectively to the time, rather than averaging all data points with the same x-axis value. These figures show that both the vertices and edges datasets have large variation and are not tight enough around the line of best fit to provide an accurate representation of the time complexity. Additionally, a time complexity of O(Edges) or O(Vertices) does not make sense because for every edge removed, the algorithm must traverse every edge and every vertex to find all of the shortest paths. This means that the time complexity must be greater than O(Edges) and O(Vertices).

If you would like to keep reading please go to part 2, written by Ryan Schaefer

URL = URL->next;

Wes Anderson — I am a Sophomore at SMU majoring in Computer Science and minoring in Mathematics.

Ryan Schaefer — I am a sophomore at SMU triple majoring in Computer Science, Statistical Science, and Data Science. I…Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

--

--