Personal Performance Analysis (GMZ Series Part 3)

Sofia Murillo Sanchez
smucs
Published in
4 min readApr 14, 2022

If you have not read parts 1 and 2 of this Girvan-Newman Algorithm series, we highly suggest you read them here and here, respectively.

This project that we took on was quite a journey. We (Eileen Garcia, Sofia Murillo, and Oliver Zhu) walked into this without knowing anything about graphs and networks beyond what we learned in class. And, as many of you already know, learning from lecture slides vs actually applying your knowledge are TWO very different experiences. Not only that, but we also had to learn an entirely new library, Boost, which is quite the learning curve.

Our Initial vs Final Approach to this Project:

When we first started this project, we did a lot of research regarding the Girvan-Newman Algorithm and its various implementations and modifications. We learned the base-level concepts and immediately jumped into coding — a big mistake. By the end of the project, we were deeply researching the equations and each individual component that makes up the Girvan-Newman Algorithm in order to accurately reflect their role — and to make up for the fact that we did not truly understand their role to begin with.

What was the most important thing we learned?

We learned many technical skills and important concepts while creating this project. Even a few life lessons. The Girvan-Newman Algorithm, as mentioned in Part 1, is “a hierarchical method used to detect communities in complex systems” [1]. Networks… Graphs… Boost… Modularity…everything felt overwhelming to learn at the moment. It is a lot to take in. However, out of all the things we learned, the most important thing we took away from this was the importance of truly understanding the base concepts before jumping to code. Specific concepts like modularity took us days to code, and it was simply because we did not understand the underlying concepts behind modularity enough to do it.

What would we change?

If we could go back in time and change something about the way we approached this project, it would be to slow down and actually take time to understand the mathematical concepts behind the Girvan-Newman Algorithm. If we had done so, we could have dedicated more time to the modification portion of the project. In particular, there was one modification we were very excited to implement, but could not do due to the time constraints of this project. This modification, in particular, utilized an extremal optimization of the modularity. In order to do so, the creators of this modification took the following approach:

“1) Initially, we split the nodes of the whole graph into two random partitions having the same number of nodes in each one. This splitting creates an initial community division, where communities are understood as connected components in each partition.

2) At each time step, the system self-organizes by moving the node with the lower fitness extremal from one partition to the other. In principle, each movement implies the recalculation of the fitness of many nodes because the righthand side of Eq. 3 involves the pseudoglobal magnitude ari.

3) The process is repeated until an “optimal state” with a maximum value of Q is reached. After that, we delete all the links between both partitions and proceed recursively with every resultant connected component. The process finishes when the modularity Q cannot be improved.”

By doing so, they were able to drastically improve the modularity initially produced by the base implementation of the Girvan-Newman Algorithm and their heuristic approach to the Girvan-Newman Algorithm was the combination of everything we have learned up to this point in CS3353.

Our own performance analysis:

Although we were able to implement the Girvan-Newman Algorithm and its modification, were we actually successful in our endeavors? In order to evaluate our success, we will compare our modularity from the base implementation on the US Football Network Dataset (Q (modularity)= 0.65597) to the published modularity [3] (Q = 0.4397). We calculated our error to be approximately 33%. In summary, we definitely have room for improvement in our implementation, but, taking our milestones and the learning curves into consideration, we performed quite well.

Lastly, the dream for the Ultimate Girvan-Newman Algorithm is far from over. We hope you stay tuned for our results on that implementation.

Sources:

[1] https://en.wikipedia.org/wiki/Girvan%E2%80%93Newman_algorithm

[2] https://journals-aps-org.proxy.libraries.smu.edu/pre/abstract/10.1103/PhysRevE.72.027104 (MLA Citation: Duch, Jordi, and Alex Arenas. “Community Detection in Complex Networks Using Extremal Optimization.” Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, vol. 72, no. 2 Pt 2, 2005, pp. 027104–027104, https://doi.org/10.1103/PhysRevE.72.027104.)

[3] https://ieeexplore.ieee.org/abstract/document/6859714 (MLA Citation: L. Despalatović, T. Vojković and D. Vukic̆ević, “Community structure in networks: Girvan-Newman algorithm improvement,” 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014, pp. 997–1002, doi: 10.1109/MIPRO.2014.6859714.)

--

--

Sofia Murillo Sanchez
smucs
Writer for

I am currently a full-time student majoring in Computer Science pursuing a specialization in AI and Machine Learning at Southern Methodist University.