Girvan–Newman — The Clustering Technique in Network Analysis Part 2
Girvan-Newman method separates the network based on the betweenness of the edges.
Introduction
Girvan-Newman method is one of the classic community clustering techniques, which separates the network based on the betweenness of the edges. By using the algorithm, we are able to separate the network into communities, and the community detection can be used as a good start of data preprocessing.
In this example we will implement the Girvan-Newman clustering algorithm using Python. Please refer to my previous post to see the details of the algorithm.
All code can be found on github.
Getting Started
First we will define an example graph using the edge_dict, where the edge dictionary contains the edge connections. We will use the example from the previous post.
defaultdict(<function __main__.<lambda>()>,
{'a': ['b', 'c', 'd'],
'b': ['a', 'c'],
'c': ['a', 'b', 'd'],
'd': ['a', 'c', 'e'],
'e': ['d', 'f', 'g', 'h'],
'f': ['e', 'g'],
'g': ['e', 'f', 'h'],
'h': ['e', 'g']})
Calculate the betweenness value of the current graph
We use the following formula to calculate the edge credit. By traversing all the vertices and we can get the edge betweenness by summing all the edge credits.
defaultdict(<function __main__.calculate_btw_and_communities.<locals>.<lambda>()>,
{('e', 'f'): 5.5,
('e', 'g'): 5.0,
('e', 'h'): 5.5,
('d', 'e'): 16.0,
('a', 'b'): 3.5,
('a', 'c'): 1.0,
('a', 'd'): 7.5,
('c', 'd'): 7.5,
('b', 'c'): 3.5,
('g', 'h'): 1.5,
('f', 'g'): 1.5})
Compute the best community
We remove the edges with highest betweenness value iteratively and aim to optimize the modularity.
remove edges [('d', 'e')] with betweenness value around 16.0
update best modularity of community split -inf ---> 0.3016528925619835
remove edges [('a', 'b'), ('a', 'd'), ('c', 'd'), ('b', 'c'), ('e', 'f'), ('e', 'h'), ('g', 'h'), ('f', 'g')] with betweenness value around 1.5
modularity after split = 0.08677685950413223, which is lower than best split 0.3016528925619835
Based on the result, we are able to get the best community split by removing the edge (d,e).
A More Complex Network
Next, we will generate a more complex network.
remove edges [('d', 'e')] with betweenness value around 64.0
update best modularity of community split -inf ---> 0.2797731568998111
remove edges [('a', 'p'), ('h', 'j')] with betweenness value around 16.0
update best modularity of community split 0.2797731568998111 ---> 0.3648393194706995
remove edges [('a', 'b'), ('a', 'd'), ('c', 'd'), ('b', 'c'), ('e', 'f'), ('e', 'h'), ('g', 'h'), ('f', 'g'), ('j', 'k'), ('k', 'l'), ('i', 'j'), ('i', 'l'), ('m', 'n'), ('m', 'p'), ('o', 'p'), ('n', 'o')] with betweenness value around 1.5
modularity after split = 0.08506616257088848, which is lower than best split 0.3648393194706995
Based on the result, we are able to get the best community split by removing the edge (d,e), (a,p), (h,j).
Conclusion
Network analysis is an useful technique to process the network data. In practice, we are able to apply the graph theory in various situation, for example, social network or recommendation system based on user behavior.
Thank you for reading, and wish you a great day.