Girvan Newman Algorithm — Community Detection in Network (Part 1)

Paige Weng
smucs
Published in
5 min readApr 30, 2022

Ziyu Sun is a senior majoring in computer science and minoring in civil engineering at Southern Methodist University.

Paige Weng is a second-year computer science student at Southern Methodist University.

Introduction

Since Newman proposed the concept of “community” in 2002, community detection in the network has developed rapidly. Community detection has become an important technique for community analysis. Community detection technology can help people better find people on social media who have something in common or the same interests. This article is about one of the classic algorithms for community detection, the Girvan-Newman algorithm and its implementation in undirected and unweighted graphs. Hopefully this post will help people better understand Girvan-Newman Algorithm.

Community In Complex Networks

Complex networks are abstract models used to understand complex systems in the real world. A typical network is composed of multiple nodes and edges, each node corresponds to a different individual, and each edge is a relationship between two points. One of the important features of complex networks is the display of community structure properties.

Community Detection

With the continuous development of complex networks, it is found that complex networks have a certain community structure. Each community in the network graph is called community structure, and the process of finding its community structure with a given network graph is called community detection.

Community Detection Algorithm — Girvan-Newman Algorithm

When there are hundreds or thousands of points and edges in the network graph, community detection becomes a large project, so we need community detection algorithms to divide communities accurately and quickly.

In 2004, Newman and Girvan’s paper proposed an algorithm for community detection using betweenness centrality, called the Girvan-Newman algorithm. Before we start to understand how the Girvan-Newman algorithm works, let’s take a look at betweenness centrality.

Edge Betweenness Centrality

Freeman defined betweenness centrality in 1977, which is one of the shortest-path-based centrality measures in network graphs. In other words, betweenness centrality is a measure of the betweenness of a node that acts as the shortest bridge between two points. The more a node acts as a bridge, the greater its betweenness centrality and vice versa.

To give us a better understanding of how the Edge Betweenness Centrality is calculated, I will use the following diagram to illustrate the calculation process.

Photo By Prateek Joshi on Analytics Vidhya

Find the shortest path from a node to all other nodes by using Breadth-first search. The diagram below shows the shortest path from node A to other nodes. At the same time, we assign scores to nodes. Nodes B, D, and C have score 1, because there is only one shortest path from node A to nodes B and D. Node E gets score 2 because there are two shortest paths to node E, and so on, we get score 3 for node F.

Photo By Prateek Joshi on Analytics Vidhya

After getting the number of each node, we will calculate the edge score starting from the tail node. That is, the computation starts at node F. The diagram below is where we get all the edge scores.

Edge score = (upper node score + all edge scores connecting lower nodes ) / current node

For example, edge FC = (1 + 0) / 3 = 0.33.

Photo By Prateek Joshi on Analytics Vidhya

Currently we only get the edge scores of the shortest path of node A, so we also need to get the edge scores of the shortest path of all remaining nodes. And taking the scores of all the same sides, the diagram below is the sum divide by 2. Divide by 2 since this is an undirected graph, we will remove duplicate edge scores.

Photo By Prateek Joshi on Analytics Vidhya

Finally, we can remove the edge of the highest EBC score and we will get three communities.

Photo By Prateek Joshi on Analytics Vidhya

Basic Flow of Girvan-Newman Algorithm

The following is the pseudocode of the Girvan-Newman algorithm:

1. Calculate the edge betweenness of all edges in the network;

2. Find the edge with the highest betweenness and remove it from the network;

3. Calculate the modularity of the network after edge removal;

4. Repeat steps 2 and 3 until all edges are removed.

To Be Continued

This article only introduces the betweenness centrality of community detection in complex networks. We also wrote about modularity, Girvan-Newman algorithm implementation and analysis. If you are interested in this content, please head to part 2 we wrote.

Summary

In this article, we introduce complex networks, communities, community detection algorithms, and betweenness. At the same time, the basic process of Girvan-Newman Algorithm is also introduced. But the Girvan-Newman Algorithm still has a lot of room for optimization, including increasing modularity to improve the accuracy of differentiated communities. We strongly recommend that you learn more about different community detection algorithms and optimization methods.

Thanks for reading, I hope this article will be helpful to you!

Code Implementation

If you need the code of our c++ Girvan-Newman algorithm implementation, please check our GitHub.

References

--

--

Paige Weng
smucs
0 Followers
Writer for

A second-year computer science student at Southern Methodist University.