Girvan–Newman — The Clustering Technique in Network Analysis Part 2

Jeffery chiang
Published in
4 min readDec 20, 2021

--

Girvan-Newman method separates the network based on the betweenness of the edges.

Introduction

Girvan-Newman method is one of the classic community clustering techniques, which separates the network based on the betweenness of the edges. By using the algorithm, we are able to separate the network into communities, and the community detection can be used as a good start of data preprocessing.

In this example we will implement the Girvan-Newman clustering algorithm using Python. Please refer to my previous post to see the details of the algorithm.

All code can be found on github.

Getting Started

First we will define an example graph using the edge_dict, where the edge dictionary contains the edge connections. We will use the example from the previous post.

Figure 1. sample graph
defaultdict(<function __main__.<lambda>()>,
{'a': ['b', 'c', 'd'],
'b': ['a', 'c'],
'c': ['a', 'b', 'd'],
'd': ['a', 'c', 'e'],
'e': ['d', 'f', 'g', 'h'],
'f': ['e', 'g'],
'g': ['e', 'f', 'h'],
'h': ['e', 'g']})

Calculate the betweenness value of the current graph

We use the following formula to calculate the edge credit. By traversing all the vertices and we can get the edge betweenness by summing all the edge credits.

defaultdict(<function __main__.calculate_btw_and_communities.<locals>.<lambda>()>,
{('e', 'f'): 5.5,
('e', 'g'): 5.0,
('e', 'h'): 5.5,
('d', 'e'): 16.0,
('a', 'b'): 3.5,
('a', 'c'): 1.0,
('a', 'd'): 7.5,
('c', 'd'): 7.5,
('b', 'c'): 3.5,
('g', 'h'): 1.5,
('f', 'g'): 1.5})
Figure 2. Edge Betweenness value

Compute the best community

We remove the edges with highest betweenness value iteratively and aim to optimize the modularity.

Modularity Formula
remove edges [('d', 'e')] with betweenness value around 16.0
update best modularity of community split -inf ---> 0.3016528925619835

remove edges [('a', 'b'), ('a', 'd'), ('c', 'd'), ('b', 'c'), ('e', 'f'), ('e', 'h'), ('g', 'h'), ('f', 'g')] with betweenness value around 1.5
modularity after split = 0.08677685950413223, which is lower than best split 0.3016528925619835

Based on the result, we are able to get the best community split by removing the edge (d,e).

Figure 3. best community split

A More Complex Network

Next, we will generate a more complex network.

Figure 4. sample graph 2
remove edges [('d', 'e')] with betweenness value around 64.0
update best modularity of community split -inf ---> 0.2797731568998111

remove edges [('a', 'p'), ('h', 'j')] with betweenness value around 16.0
update best modularity of community split 0.2797731568998111 ---> 0.3648393194706995

remove edges [('a', 'b'), ('a', 'd'), ('c', 'd'), ('b', 'c'), ('e', 'f'), ('e', 'h'), ('g', 'h'), ('f', 'g'), ('j', 'k'), ('k', 'l'), ('i', 'j'), ('i', 'l'), ('m', 'n'), ('m', 'p'), ('o', 'p'), ('n', 'o')] with betweenness value around 1.5
modularity after split = 0.08506616257088848, which is lower than best split 0.3648393194706995

Based on the result, we are able to get the best community split by removing the edge (d,e), (a,p), (h,j).

Figure 5. best community after split

Conclusion

Network analysis is an useful technique to process the network data. In practice, we are able to apply the graph theory in various situation, for example, social network or recommendation system based on user behavior.

Thank you for reading, and wish you a great day.

--

--

Jeffery chiang
Analytics Vidhya

Data Science | Machine Learning | Mathematics | DevOps