Finding the Time Complexity of Girvan-Newman: Part 2

Ryan Schaefer
smucs
Published in
3 min readApr 13, 2022

Authors — Wes Anderson and Ryan Schaefer

Note — This article is a continuation of a two part article series. If you have not already read part 1, please go to the article written by Wes Anderson.

URL = URL->prev;

Final Estimations

Since both edges and vertices vs time had discernible trends, we next tried plotting the combined (edges * vertices) vs time. Figure 5 shows what this graph looks like. While there are a few outliers present in the graph, there is a clear trend, as the variance between the data points is significantly lower than in Figures 1 and 2 from the previous article. From this we can conclude that some variation of (edges * vertices) will result in a linear relationship with the algorithm time.

From here we tried several iterations of graphs trying to get a more linear trend to appear. What we settled on is the graph (vertices ^ 2 * edges ^ 2) vs time. Figure 6 plots the relationship between the (vertices ^ 2 * edges ^ 2) and the algorithm time. From this graph, we can see a clear linear trend, despite a few outliers. Now that we have found a linear relationship with the algorithm time, we can now compute a regression model to confirm the relationship.

To create a regression model out of our data, we used the lm function in R. The default regression model that was generated was not very good. Due to the outliers in the top left and bottom right of the data distribution, the slope of the line was closer to 0 than the actual relationship. To compensate for this, we forced the model’s intercept to be 0 so that a less horizontal line of best fit will be found. A graph with 0 nodes and edges is by default in a “community structure” and therefore would not need to be run through the algorithm, or if it was it would take very close to 0 seconds.

Our final regression model is shown below in figure 7. An R² value of 0.8 shows that a strong linear trend was found, but it is not perfect due to the previously mentioned outliers. The slope was found to be approximately 7.5275 * 10^-10, which by itself does not mean much for our analysis of Girvan-Newman’s time complexity. However, the linear relationship with the algorithm time means that a coefficient greater than this slope would satisfy the definition of Big O, meaning we have found an approximate upper bound for our implementation of Girvan-Newman.

Conclusion

Based on the data we recorded, both the number of vertices and the number of edges have an influence on the timing of our implementation of the Girvan-Newman algorithm. This is shown through our predicted model of the time complexity being O(Vertices² * Edges²). Since the R² of the regression model created from the relationship between the (Vertices² * Edges²) and the time is 0.8, there is a high likelihood that this is an accurate time complexity analysis.

Wes Anderson — I am a Sophomore at SMU majoring in Computer Science and minoring in Mathematics.

Ryan Schaefer — I am a sophomore at SMU triple majoring in Computer Science, Statistical Science, and Data Science. I…Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

--

--