# Markov Chain Monte Carlo

## A visual interpretation with Python

A Markov chain can be defined as a stochastic process Y in which the value at each point at time t depends only on the value at time t-1. It means that the probability for our stochastic process to have state x at time t, given all its past states, is equal to the probability of having state x at time t, given only its state at t-1.

If the set of possible states, S, is finite, we can provide a visual representation of our chain as follows:

Each circle represents a state, in that case S={A, B, C}, while arrows represent the probability of our process to jump from one state to another. We can collect all these probabilities in a matrix P, called transition matrix, as such:

Where:

Then, at each point in time, we can describe the (unconditional) probability distribution of our process, which will be a vector with a number of components equal to the dimension of S. Each component represents the unconditional probability for our process to take value equal to a given state. Namely:

The interesting property about μ is that it is linked to the transition matrix by the following relation:

So, once we have a known starting value of our vector (which makes sense, since we start from an observable state, hence we will have a vector of zeros but a one in the position of the starting state) we can compute the distribution of the process at any point in time.

Furthermore, there is a specific value of our vector for which this equality holds:

If such value exists, we call the corresponding μ the invariant distribution of the process.

The invariant distribution is a pivotal concept when we talk about Markov Chain Monte Carlo (MCMC) methods. Those latter comprise a class of algorithms for sampling from a probability distribution which constructs a Markov chain that has the desired distribution as its invariant distribution.

Indeed, the goal of Montecarlo methods is finding ways to sample from distributions that are not easy to sample from. To bypass this problem, there are methods, like rejection sampling and importance sampling, which use an easier function, called ‘proposal’ (you can read more about it here).

Let’s simulate a Markov Chain, considering a variable where it is likely that the state of today depends only on the state of yesterday. This variable might be the weather. So let’s consider the following chain:

We can interpret this diagram in the same way as before. Namely, if today is sunny, the probability that tomorrow will be sunny again is 50%, that of being rainy is 15% and, finally, that of being cloudy is 35%.

We can collect the arrows in the following transition matrix:

import numpy as np

P = np.array([[0.5, 0.15, 0.35], [0.45, 0.45, 0.1], [0.1, 0.3, 0.6]]) P

Output: array([[0.5 , 0.15, 0.35], [0.45, 0.45, 0.1 ], [0.1 , 0.3 , 0.6 ]])

We also have a starting point, let’s say ‘cloudy’, hence we already have our starting distribution of y, that is μ _0=[0,0,1].

Since we have a starting μ and a transition matrix, we can compute μ at any point t in time. So, provided with these tools, I want to create a stochastic process (with Markov’s property, so depending only on the previous period of time) according to the probability distribution at every t.

It means that my resulting random variable Y will have a number of components equal to the number of instants and, each component, is the realization of my process according to the probability distribution of that instant of time. To do so, we want to generate a random number from a uniform distribution and set the following rule:

So let’s implement it with Python. For this purpose, I imagined to examine 50 days, then I encoded Sunny = 1, Rainy = 2, Cloudy = 3.

m=np.zeros(150).reshape(50,3)m=np.zeros(150).reshape(50,3)

m[0]=[0,0,1]

ndays = 50

Y=[0]*ndays

u = np.random.uniform(0,1,50)

for i in range(1, ndays):

tmp=[]

m[i] = m[i-1].dot(P)

if u[i] < m[i][0]:

Y[i]=1

elif u[i] < m[i][0] + m[i][1]:

Y[i] = 2

else:

Y[i] = 3Y[i] = 3

If I plot my random process, I obtain something like that:

The interesting thing about this procedure is that, if we compute the mean of our list of probability distributions (one for each t) we obtain:

[np.mean(m[:,0]), np.mean(m[:,1]), np.mean(m[:,2])]Output:

[0.3239190123456788, 0.2888770370370369, 0.3872039506172838]

Which approximates the invariant distribution, which can be computed as follows:

a=np.array([[-0.5, 0.45, 0.1], [0.15, -0.55, 0.3], [1,1,1]])

b=np.array([0,0,1])

mu = np.linalg.solve(a, b)

muOutput:

array([0.33777778, 0.29333333, 0.36888889])

So, we created a random sample from a probability distribution which is equal to the invariant distribution of a Markov Chain. If we consider that distribution being equal to our target distribution (which, remember, was hard to sample from), we found a way to bypass this problem.

*Originally published at **http://datasciencechalktalk.com** on October 5, 2019.*