Girvan–Newman — The Clustering Technique in Network Analysis Part 2

Jeffery chiang

Published in

Analytics Vidhya

4 min readDec 20, 2021

Girvan-Newman method separates the network based on the betweenness of the edges.

Introduction

Girvan-Newman method is one of the classic community clustering techniques, which separates the network based on the betweenness of the edges. By using the algorithm, we are able to separate the network into communities, and the community detection can be used as a good start of data preprocessing.

In this example we will implement the Girvan-Newman clustering algorithm using Python. Please refer to my previous post to see the details of the algorithm.

All code can be found on github.

Girvan–Newman — The Clustering Technique in Network Analysis

Girvan-Newman method separates the network based on the betweenness of the edges.

medium.com

Getting Started

First we will define an example graph using the edge_dict, where the edge dictionary contains the edge connections. We will use the example from the previous post.

defaultdict(<function __main__.<lambda>()>,
            {'a': ['b', 'c', 'd'],
             'b': ['a', 'c'],
             'c': ['a', 'b', 'd'],
             'd': ['a', 'c', 'e'],
             'e': ['d', 'f', 'g', 'h'],
             'f': ['e', 'g'],
             'g': ['e', 'f', 'h'],
             'h': ['e', 'g']})

Calculate the betweenness value of the current graph

We use the following formula to calculate the edge credit. By traversing all the vertices and we can get the edge betweenness by summing all the edge credits.

defaultdict(<function __main__.calculate_btw_and_communities.<locals>.<lambda>()>,
            {('e', 'f'): 5.5,
             ('e', 'g'): 5.0,
             ('e', 'h'): 5.5,
             ('d', 'e'): 16.0,
             ('a', 'b'): 3.5,
             ('a', 'c'): 1.0,
             ('a', 'd'): 7.5,
             ('c', 'd'): 7.5,
             ('b', 'c'): 3.5,
             ('g', 'h'): 1.5,
             ('f', 'g'): 1.5})

Compute the best community

We remove the edges with highest betweenness value iteratively and aim to optimize the modularity.

remove edges [('d', 'e')] with betweenness value around 16.0
update best modularity of community split -inf ---> 0.3016528925619835

remove edges [('a', 'b'), ('a', 'd'), ('c', 'd'), ('b', 'c'), ('e', 'f'), ('e', 'h'), ('g', 'h'), ('f', 'g')] with betweenness value around 1.5
modularity after split = 0.08677685950413223, which is lower than best split 0.3016528925619835

Based on the result, we are able to get the best community split by removing the edge (d,e).

A More Complex Network

Next, we will generate a more complex network.

remove edges [('d', 'e')] with betweenness value around 64.0
update best modularity of community split -inf ---> 0.2797731568998111

remove edges [('a', 'p'), ('h', 'j')] with betweenness value around 16.0
update best modularity of community split 0.2797731568998111 ---> 0.3648393194706995

remove edges [('a', 'b'), ('a', 'd'), ('c', 'd'), ('b', 'c'), ('e', 'f'), ('e', 'h'), ('g', 'h'), ('f', 'g'), ('j', 'k'), ('k', 'l'), ('i', 'j'), ('i', 'l'), ('m', 'n'), ('m', 'p'), ('o', 'p'), ('n', 'o')] with betweenness value around 1.5
modularity after split = 0.08506616257088848, which is lower than best split 0.3648393194706995

Based on the result, we are able to get the best community split by removing the edge (d,e), (a,p), (h,j).

Conclusion

Network analysis is an useful technique to process the network data. In practice, we are able to apply the graph theory in various situation, for example, social network or recommendation system based on user behavior.

Thank you for reading, and wish you a great day.

Girvan–Newman — The Clustering Technique in Network Analysis Part 2

Introduction

Girvan–Newman — The Clustering Technique in Network Analysis

Girvan-Newman method separates the network based on the betweenness of the edges.

Getting Started

Calculate the betweenness value of the current graph

Compute the best community

A More Complex Network

Conclusion

Written by Jeffery chiang