What is Random Walk?

Understanding Random Walk and its types

Mehul Gupta
Data Science in your pocket

--

Photo by Eric Masur on Unsplash

It’s time to come back to basics. And when we talk about basics in Data Science and AI, it’s usually statistics and probability. If you have worked with Graphs or Time Series, you might have heard of this term in the very beginning of your book/course. You might not be working directly with it, but many algorithms are built on top of it and hence it is a crucial topic to know if you’re in AI. As a matter of fact,

Graph algorithms like DeepWalk and Node2Vec uses Random Walk for generating vector representation of nodes in a graph.

Random walk can be used as a baseline for Time Series prediction alongside anomaly detection in time series.

In Reinforcement Learning, Random walks act as exploration strategies, enabling agents to discover new states and actions, which is crucial for learning optimal policies.

Search Engine Ranking algorithms like PageRank also uses Random Walks

and many more use cases …

Now as we understand its significance, we must jump onto

What is Random Walk?

As the name suggests, random steps/events one after another without following any pattern or algorithm.

easy peasy

Now, interestingly, as we move from one domain to another, this definition gets slightly tweaked. Hence Random walk in Time Series is not same as Random walk in Graphs. Let’s understand these different Random Walks one by one. We would be focusing majorly on different random walks in Time Series and Graphs in this post.

Random Walk in Time Series

In time series analysis, a random walk refers to a stochastic(random) process where each future value is determined by the current value plus a white noise term. Random Walk can be considered as a baseline solution (weak learner) for a time series problem, the same as decision trees are for classification problems. The equation is:

Yt​=Yt−1​+ϵt​.

Where

Yt = Predicted future value

Yt-1= current value

et = white noise term

Note: White noise in time series is characterized by a constant mean (often set to zero), constant variance, no autocorrelation between current and past/future observations, and random, unpredictable values at different time points.

Random walk with drift: In this case, we add a constant D to the previous equation called as the drift constant. This drift constant helps in including an upward or downward trend in your predictions if the Trend is observed in Time series modeling. The equation is slightly changed.

Yt​=Yt−1​+ϵt​ + D

Where D is a constant called Drift

Correlated Random Walk: Very similar to the baseline random walk, in this case, we introduce a correlation coefficient in the equation such as

Yt​=pYt−1​+ϵt​

Where p is the correlation coefficient. As you must have guessed, this helps in incorporating trend as well as autocorrelation (next value dependent on previous value)

Random Walk in graphs

In graph theory, we can have multiple types of random walks. Below are a few examples.

Simple Random walk: It involves a walker moving from a current node to a randomly chosen neighbor node at each step. The walker has an equal probability of moving to any neighboring node (if an edge is present).

Weighted Random walk: Instead of keeping same probability for every node in the next step, you can provide higher weights to some of the nodes so that they are preferred compared to others.

Directed Random walk: As you must have assumed, it is used with directed graphs where the next move can only be possible to a node ‘X’ if a directed edge is present between the current node and node ‘X’.

Self-avoiding Random walk: In such random walks, already visited nodes are avoided in future steps and any node can be visited just once.

With this, I will be wrapping this post. This post will be used as a pre-requisite for some of the graph theory algorithms in my upcoming Graph Neural Network blog series in the coming month.

--

--