Introduction to Markov Chain

4 min readFeb 5, 2022

This article covers the very basic understanding of Markov chains and the key terms and objects associated with it.

Markov chains are among the most important stochastic processes. They are stochastic processes for which the description of the present state fully captures all the information that could influence the future evolution of the process. Predicting traffic flows, communications networks, genetic issues, and queues are examples where Markov chains can be used to model performance. Google uses the Markov chain in their page ranking algorithm to determine the search order.

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event

Generally we use data as “measurements”, build a “model” to make the “predictions”. If this process is done in context of time dependence, we can divide time as: past, present, and future. We can treat past and present observations as measurements and predict the future based on that. The series of events makes a Markov chain.

Key terms

States: The possibilities are defined as states. For eg, if the event is tossing a coin, state space is {H,T}. For a dice roll, state space is {1,2,3,4,5,6}

Random variable: a variable whose value is unknown or a function that assigns values to each of an experiment’s outcomes. In case of 2dice rolls, valid random variable are : sum/product of the outcomes, number of odd outcomes. Basically, anything which is measurable and can be calculated from a defined event.

Random process: a series of outcomes of a random variable. For Markov chains, it is a series corresponding to time t.

Transition probability: This represent the probability of getting a state i at time t given state as j at time t-1. These values are represented by N X N matrix, where N are the number of states in state space.

Consider it can be either sunny, cloudy, or rainy on a given day, denoted by A, B, And C respectively. The probability of a state depends only on the state on the previous day. It is generally N X N matrix, where N is number of states in the model, 3 in this particular example. This can be represented by a graph as well with weighted edges as,

graph representation of transition matrix

The transition matrix corresponding to this will be,

Following the completeness property of probability, another important features are that all elements must be non-negative and all rows must sum to 1.

If the elements of transition matrix are not dependant on time, it is called time-homogeneous Markov chain.

Initial probability: It is a N X 1 matrix and represents the probability of starting at a particular process.

Distribution: Sometimes, we do not know the exact state at any time, but just the probabilities. In that case, state at time t is given by a row matrix where i-th element represents probability of being in i-th state. This is N X 1 matrix and called “distribution”. In case, the exact state is known, the distribution can be: H(t)= (0,0,1,0,0,0) when the system is in 3rd state at time t.

Memoryless property

This is one of the most important property, also called as Markov property. Consider a coin-toss, let’s say we get a head first, the probability of getting a head/tail is still 0.5 on the next toss (given the coin is unbiased).

Thus, the probability of the future for a Markov process doesn’t depend on the part observations, just the present. Mathematically, it is expressed as,

Fitting a Markov chain

This is the inverse problem for the implementation of Markov chains. How to construct them? Or how do you find the transition matrix essentially?

There should be some data which is utilised to create the matrix, which is then later used for prediction. Considering our previous example for A=sunny, B=cloudy, C=rainy, suppose we are given data for 10 days (just for example. In general, we need bigger dataset to make meaningful prediction) as:

E= ABBAACBCAC

Then to (i,j)-th element of the transition matrix is given by,

This is very intuitive. For example, we have A 4 times, which is preceded by A once, B once and C twice. Thus, the probability of A going to A, B, C are 1/4,1/4, and 2/4 respectively. Thus, the transition matrix based on given Markov chain is

Note that every row sums to 1.

This ends the very basic introduction to basic Markov chain, and the related key terms.

Introduction to Markov Chain

Key terms

Memoryless property

Fitting a Markov chain

Written by Bharti Kindra