# Random Walk with Restart and its applications

Recently I had to read about Random Walk with Restart(RWR) for a project of mine. Although the algorithm is pretty basic, it took me a lot of time to find a source which not only explains the problem properly but also gives an intuition as to why it works.

# Introduction

Random Walk with Restart (RWR) is an algorithm which gives the closeness between two nodes in the graph. It was originally proposed for the task of image segmentation.

**Problem Statement**

We are given a connected graph with transition probabilities *pᵢⱼ*’s denoting the probability of transition from state/node j to i. In normal Random Walk at each stage, we can go to any of the adjacent nodes. So finally the task is to calculate the most likely location we are going to be.

In RWR the main difference is that at each stage there is a probability that we can restart i.e. go to the starting node.

Formally, we have to find a solution to the equation

r = cWr+ (1-c)e

where *c* is a number in the range(0,1), *W* is the matrix of transition probabilities where *W[i,j]* denotes probability of going from j to* i*, e is the starting vector i.e. *e[i] = 1* if *i* is the starting node else *0 *and *r* is a column vector where *r[i]* denotes the probability of being at node *i*

How we got this equation will be explained in the following sections.

One can relate the normal Random walk with a drunk person walking in a normal field where he can take a step in each direction with a fixed probability while RWR can be thought of as the same drunk person in a field where there is a finite probability at each step that he may touch a port key which would teleport him back to the starting position in the field.

# Explanation

Suppose that we are just at the starting node *e*. Then the probability *r *of being at any node i in the next move will be given by

r₀[i] = W[i].e

where *W[i]* represents the ith row of matrix W i.e. transition probabilities to node i from all the nodes. In the next turn the probability of being at node i will be

r₁[i] = W[i].r₀

So without restart, the probability of being at node i after k moves can be simply given by

rₖ[i] = W[i]ᵏ.e = W[i].rₖ-₁

Now taking restart into consideration we get

rₖ[i] = c.W[i].rₖ-₁ + (1-c)e

So the task is to find the value of pₖ such that further simulations won’t affect the probabilities. So we assume that r has converged i.e. rₖ-₁ = rₖ. Thus we can write the above equation as

r = cWr+ (1-c)e

The above equations might look very similar to the PageRank equation. In fact, the only difference between normal PageRank and RWR is that generally in page rank the probability of jumping to another page is taken as uniform i.e. we replace *e *with a column vector with all 1s. Compared to PageRank, RWR doesn’t talk about how the matrix W is constructed.

# Solutions

Traditionally two solutions have been used for solving RWR related problems.

## Iterate until convergence

The solution involves calculating the value of the vector r iteratively using

rₖ[i] = c.W[i].rₖ-₁ + (1-c)e

until we reach the point of convergence or a specified number of iterations. The solution thus requires no pre-computation and typically we reach convergence within a very small number of iterations. For example, in the original paper for PageRank, it was reported that the PageRank algorithm for a network consisting of 322 million links (in-edges and out-edges) converges to within a tolerable limit in 52 iterations. However, the number can be larger for a graph with low connectivity and a large number of nodes. The time required is linear to the iteration number and square to the number of nodes in the graph(due to matrix-vector multiplication).

## Using inverse of the matrix

The equation obtained above can be rewritten by taking the term with r to the LHS. Thus we get

( I — cW)r = (1-c)e

which gives us

r = (1-c)( I — cW)⁻¹e

In this method, the time taking and computation intensive step is finding the inverse of the matrix which has the complexity O(n³) using Gaussian elimination. The overall complexity is determined by the complexity of matrix inversion as matrix-vector multiplication has the complexity O(n²). Another important thing to note is that here by computing the matrix ( I - cW)⁻¹ once we can get the required probabilities for any starting node for the given graph. This is particularly useful if we need to perform several random walks on the same graph with different starting nodes(eg. in image segmentation task using Random Walker algorithm). However, it requires storing (I - cW)⁻¹ which requires O(n²) space.

Some optimisations also exist which reduce the amount of space required to store the inverse of the matrix by assuming that most of the values in the matrix are correlated(Tong et. al.).

An important thing to observe is that the initial value of the vector *r* doesn’t affect the probabilities we get on convergence as long as the graph is connected. This can be seen from the expression we get using the matrix inversion method for RWR.

# Applications

**Image segmentation**- Originally RWR was proposed for image segmentation. The image is converted into a graph where the weight of the edge between the adjacent pixels represents the similarity between the two pixels. The way it works is that we release multiple random walkers from some positions in the image(selected by a user or using an automated process). This gives us a score as to how close each pixel is to the given pixel. The pixels/nodes for which the algorithm yields a higher value for a given random walk are assigned the same cluster as the starting node of that random walk.**Automatic Image Captioning**- RWR can be used to automatically generate captions for an image. The way it works is that first a graph is made using the set of captioned images and query image. Then for each caption word it estimates the closeness with the image using RWR. However, due to the rise of deep learning based techniques, this method isn’t used much nowadays**Community detection**- Community detection methods generally cluster the nodes in the graph based on some function/distance. Since RWR gives the closeness of the nodes in the graph it can be used as the distance between a pair of nodes in the graph.**Recommender Systems**- RWR gives the closeness of different nodes in the graph. This property is naturally suited for recommender systems. Twitter uses a similar method for recommending users to follow.