Using Girvan-Newman to Identify Communities — Part 2: Modularity

Published in

smucs

3 min readApr 12, 2022

Following Kate’s discussion on the Girvan Newman algorithm, I will discuss modularity in more detail. Modularity measures the strength of community structure and therefore aided us in knowing when to stop removing edges in our algorithm implementation.

Essentially, given a network, the modularity Q of the network is the difference between the number of edges within each found component and the expected number of edges within each found component. The expected number of edges refers to if the same degree of edges were randomly placed between vertices. In other words, it is the number of edges falling within a network community minus the expected number of randomly placed edges in an equivalent network community. It can be a positive or negative number. The more positive the value of Q, the more significant the community structure.

The definition may sound confusing but in reality, it is quite simple. As seen in Figure 1, Q is given by the sum of Aᵢⱼ -(kᵢkⱼ/2m) over all pairs of vertices i and j that fall in the same community.

Aᵢⱼ represents the number of edges between vertex i and j. kᵢkⱼ/2m represents the expected number of edges between vertices i and j if edges are placed at random where kᵢ and kⱼ are the degrees of the two vertices and m represents the total number of edges in the network.

In the formula, there are two summations that we can simplify conceptually. Essentially we are considering every pair of vertices within the same community and calculating Aᵢⱼ-(kᵢkⱼ/2m) which we can call y. Then, if there is a connection between the two vertices, we subtract y from 1. If there is not a connection, we subtract y from 0. Therefore, more positive modularity suggests a better community structure as we subtract less from the total modularity.

https://www.youtube.com/watch?v=lG5hkAHo-zs

Please reference the video above for a great, simple example for calculating modularity!

We implemented a robust modularity function. After each edge is removed, we used the built-in boost connected components library to detect any partitions in the graph. The connected components library allows us to detect any partitions in the graph and the vertices associated with each partition. For more information on the connected components library, please see the documentation linked here. Then, we iterate through every pair of nodes in the graph using two nested for loops. We only calculate the modularity for vertices in the same component and pairs of vertices with unique elements. So, we skip over pairs 0–0, 1–1, etc. In order to consider each pair only once, we keep track of how many vertices we have already considered and can therefore skip. For example, if we have considered all vertice pairs with 0 as the first element, when we reach the end of the inner for loop, we know we need to skip one vertice in the next iteration of the inner for loop. So, we skip over 1–0.

To calculate the modularity, we first check if there is a connection between the two vertices. If there is, we subtract the Aᵢⱼ — kᵢkⱼ/2m from 1 and if there is not, we subtract it from 0. After the function has iterated over every pair of vertices, it returns the modularity for the given graph.

In order to optimize the community structure in our Girvan Newman implementation, we calculate the modularity after removing each edge. If the modularity of a given graph is greater than any previous modularity values, we store the modularity and the associated graph. After the algorithm has finished removing all of the edges, we have the graph with the optimal community structure saved.

To see how we further explored the Girvan-Newman algorithm using modularity, please continue reading part 3 linked here.

Github repository link: https://github.com/smu-cs-3353/22s-pa03-girvan-newman-kate-lydia-nicole

This article was written by Nicole Vivier on April 11, 2022. Nicole is currently a sophomore at Southern Methodist University studying Computer Science and competing for the women’s golf team. Nicole has an interest in the fields of AI/Machine Learning and Cyber Security.

Using Girvan-Newman to Identify Communities — Part 2: Modularity

Written by Nicole Vivier