Demystifying Markov Clustering

Introduction to the Markov clustering algorithm and how it can be a really useful tool for unsupervised clustering.

Anurag Kumar Mishra
Analytics Vidhya

--

Photo by Clint Adair on Unsplash

INTRODUCTION

In the huge oceanic world of data science, one may have encountered the island of clustering. When it comes to unsupervised learning and finding patterns across data, clustering is the most ancient and widely used technique.

In this blog we will talk about one specific one and bit of niche one, if you would say, known as MCL or Markov Clustering. I would refer Markov Clustering as MCL in the coming references.

What really differentiates MCL from other clustering algorithm is the fact that it helps you in detecting communities as they call it, amongst the nodes present and also since it is un-supervised, it won’t require any input from you to form clusters unlike K-means algorithms and others. This really makes this algorithm crucial when it comes to network data, social-network data or maybe similarity detection as well.

Usually clustering is bifurcated into two major parts:

  1. Vector clustering and
  2. Graph clustering which kind-of tell their story on their own.

MCL is a type of graph clustering, so you must understand a bit of graph theory, but nothing too fancy though.

A bit of Graph Theory

If I were to explain a graph in plain and simple terms it would be: Graph is a set of vectors and edges represented as G = (V,E) where V = {v1,v2,v3…vn} i.e n vectors and E = {e1,e2,e3…em} with m edges. If vertices vi and vj are endpoints of an edge then, when there is no ambiguity, we denote that edge by the ordered pair (vi,vj).

Simple Unweighted Graph

While the graph shown above is one simple unweighted graph, we would operate with weighted graphs in the MCL algorithm ahead. Weighted graphs are nothing but a more informative graph which tells you the probability of visiting the next node or maybe the distance of a node to other or maybe the dependency of one node over the other.

Generally, for storing such information we use the adjacency graph matrix which would be explained later if you are wondering. This is all you need to know for MCL..see easy-peasy.

Random Walks and Markov Chains

Random walks is the core of MCL, so lets understand this using a simple example. If you remember the travelling salesman problem, this is kind of more random than that, but along the same lines.

So, think of Sam, the traveller — true wanderluster lives in a city in London and wants to roam around the globe. Think of the cities as nodes in our graph and edges as when one can travel from one city to other. If there’s no edge between two cities, it won’t be possible for travelling to that city, simple as that.

Following his random thoughts, Sam doesn’t do the planning and goes out. So, based on a hypothetical n-sided dice Sam makes choice to go to city to which his city he wants to go. This process mathematically is called a stochastic process or a random process, that would describe a path of random paths ahead. This is called a Random walk.

Now, coming to Markov chains,

A markov chain is a system in which the next state is dependent upon the current state based on some probability or rule.

At any point in time, the current state is solely decided based on last state irrespective of the fact that how one got there. That is markov chain if simply put. Finally, we calculate the random graphs based on “Markov chains”. The intuition behind that is that the randomness is not completely random, being at one city, the probability of Sam visiting a nearby city is more than visiting a city very far. Still confused? Look at the example below..

As explained in the above diagram, this is how we get our transition matrix or probability matrix for the first time step.

Inflation and Expansion

For implementing MCL, we need to look at two important operations involved known as Inflation and Expansion.

Inflation:

  • Under Inflation operation, for each vertex the transition values are changed so that strong neighbor values are strengthened and large neighbor values are demoted. This is can achieved be raising the column value to non-negative power and then re-normalizing.
  • For e.g. Taking inflation operator as 2. So square each element of the column and then normalize the values.
  • The inflation operator is responsible for both strengthening and weakening of current.(Strengthens strong currents, and weakens already
    weak currents). In the end, this influences the granularity of clusters.

Expansion:

  • Expansion helps in making the farther nodes or neighbors reachable. This is achieved mathematically by taking nth power of the matrix. This value, the expansion operator “e” is chosen while implementing.
  • For e.g. Take the e=2 and take 2th power of the matrix as A². For taking nth power we would require another blog as it would require calculation of eigen vectors, eigen values and diagonal matrix for calculation the nth power, which I would explain if required.
  • The expansion operator is responsible for allowing flow to connect different regions of the graph.

MCL Algorithm… finally

If you followed well till now, that’s more than half the job done. Ok, lets jump right to it. Steps for MCL:

  • Input would be an un-directed graph, power parameter e and inflation parameter r.
  • Calculate the markov matrix or the probability matrix by creating the associated matrix. One important thing to note here is that we need to obtain the markov matrix by normalizing the adjacency matrix.
  • Once the markov matrix is obtained, the random walk is then stimulated by alternating two operations known as Inflation and Expansion explained above.
  • Step i : Expand the matrix using the “e” parameter.
  • Step j: Inflate resulting matrix using “r” parameter.
  • Repeat the steps i and j until convergence ( i.e the equilibrium of the values has reached or there’s no significant value changes being observed).
  • Interpret the results to obtain the clusters.

Voila! MCL ready!

Freshly cooked MCL is ready to serve with some visible clusters. If you would like to write the markov clustering from scratch it shouldn’t be much of a problem, but be sure of adding self-loops for better convergences. And in case you are wondering if there’s something available to use, indeed there is a python package markov-clustering to help with you that. If you have never used networkx, you might need to look at that as well, a python package that allows graph operations easier to write.

Finally take a look at below code to see the implementation of the same in python.

Thanks for patience if you have made this far. If you learned something from the blog, appreciate it or if you any questions please shout. Adios..!!

--

--

Anurag Kumar Mishra
Analytics Vidhya

Masters Student of Data-Science. Ex-Data Scientist at McKinsey. Love the amalgamation of data and mathematics 😀