A Brief Introduction to Markov Chains

A general guide on what makes the Markov Decision Process tick

Rishabh Anand
Sigmoid
4 min readFeb 27, 2019

--

tldr; This is part of a series of reference articles teaching the concepts that drive Reinforcement Learning. We’ll be starting off simple, incrementally moving on to more complex concepts further down the road.

A Markov Chain is a stochastic (always changing) model that is used to predict/estimate/guess the outcome of an event given only the previous state and its action. The state refers to the current ‘slice’ of the environment. It’s the particular condition that an agent is contained in at a specific time step.

For example, in Sonic the Hedgehog, a state refers to the current pixel intensity values that represent the game screen.

The state refers to the current pixel values that make up this particular game screen at time 0:05 seconds

It can be represented by a matrix of values pertaining to the pixel values. When used in a Reinforcement Learning algorithm, it can be represented as something like this:

The different color RGB channels that form the current state

The Markov Chain

The Markov Chain is a model used to describe a sequence of consecutive events where the probability or chance of an event depends only on the event before it.

If a sequence of events exhibits the Markov Property of the reliance on the previous state, then the sequence is called ‘Markovian’ in nature.

An example of a Markov Chain with numbers for visualization purposes

The figure above shows two states, A and E. The probability of an event occurring depends on the current state. A movement from A to E or vice versa depends on whether the ‘character’ was previously on A or E.

If the previous state or location was A, then the probability of going to E is 0.4 while the probability of staying on A is 0.6 (fundamentally A to A).

The probability of going from E to A is 0.7 while the probability of staying on E is 0.3 (E to E).

Hopefully, this can help you visualize the process of moving from state to state depending on the current state. If we were to write this mathematically, it is simply the probability of an event happening in the future time step, St+1, s’, given the current state St, s:

The advent of Markov Chains

For some problems in Reinforcement Learning, the actions performed in a particular state is directly related to the previous state, the actions performed in that state and the rewards that the agent receives upon performing said actions.

This Markovian property of a problem greatly benefits us as it makes the agent really versatile while being really lightweight compared to deep neural networks with heavy parameters.

Furthermore, Markov Chains are used in a process called Markov Reward Process where the agent (whose objective is to maximize the rewards earned in the task) in the current task is rewarded based on its performance in the current time step.

In a nutshell

Markov Chains are really useful in Reinforcement Learning as it has enabled us to achieve and even exceed human performance in many areas and problems from teaching robots how to handle objects with care to beating difficult Atari games that were previously never beaten by statistical Machine Learning.

Over the next few articles, we’ll go through the various forms and applications of the Markov Property and how useful it is when dealing with certain problems.

For the next article, I am planning to write a small dictionary of all the terminology used in the field of Reinforcement Learning as a reference piece. If you have any suggestions for RL articles or questions in general, feel free to drop in a comment below.

With that, I’ll see you in the next one.

Original article by Rishabh Anand

--

--