They call me Mr Stochastic: Markov chains, part I

Andy Elmsley
The Sound of AI
Published in
4 min readMar 13, 2019

The latest post in our AI coding tutorial series.

A random walk through a Markovian space.

Welcome back, AI artisans. This week, we’ll continue the theme of stochastic models by implementing our own Markov chain. Before continuing, make sure you’ve warmed up by understanding the basics of probabilistic terminology, systems and some of the helper functions I covered in the previous post.

The trouble with simple distributions

Let’s consider the following sequence:

AEEAAAEAAA

At first glance, there’s not much logic to the sequence. It appears fairly random — a classic case of a stochastic process.

Just like we did in the previous post, we can create a simple, discrete distribution by counting up the As and Es and then normalising the sum. The distribution for this sequence would look something like this:

We could use this to re-generate a sequence and we might get something like the following:

AEAEAAEEAA

This seems reasonable enough at first glance; but isn’t it missing something from the sequence? It’s hard to tell with something as meaningless as As and Es, so let’s look at this more semantic sequence instead:

red flower red car red car red flower

The discrete distribution for the above sequence would look like this:

If we were to use this distribution, it’s entirely possible that we might randomly generate the following sequence:

red red car red red red car flower

Not quite what we’re looking for! What’s gone wrong here is that the simple probability distribution is not capturing any of the local structure of the sequence. In other words, the next word in the sequence depends on the current word — this dependency can’t be captured by this simple frequency distribution approach, which should only be used for independent outcomes.

Markov models

Andrey Markov. Great beard, also good at maths.

Andrey Markov was a turn of the 20th century Russian mathematician, best known for his work in differential calculus and stochastic processes. As you might’ve guessed, he’s the inventor of the Markov chain, which he first used to model alliteration in Russian literature.

Generally speaking, a Markov chain can be used to model a stochastic process where the next state is influenced by the current state, and, optionally, any number of its previous states. Markov chains belong to a wider set of similar systems, collectively known as Markov models. In a later Code of AI tutorial we’ll delve into the Hidden Markov Model, but for now, we’ll stick with the simpler Markov chain.

Markov chains have two components: an initial distribution and a transition matrix. The initial distribution is a discrete probability distribution that tells us what state to start with. The transition matrix is a set of nested discrete probability distributions, differing depending on the current state.

Let’s think of our initial ‘AE’ sequence in terms of its states. We could visualise the sequence like this:

A Markov chain with two states

The initial distribution in this case is easy; A and E both have equal chance of being the first state.

Now, if we are in A, we have a 60% chance of staying in A, and a 40% chance of moving to E. If we are in E, we have a 30% chance of staying in E and a 70% chance of moving back to A. We can represent that quite simply as a transition matrix with nested dictionaries:

And that’s all the data we need to represent a Markov chain!

Now, when we’re using the chain to predict or generate new values, we can sample from the initial distribution to get our starting state, and then sample again from the transitionMatrix from that point on.

Running this code gives us an output sequence very similar to what we’re after, with the added benefit that we can pick up on some dependent local structure in the sequence. And that, is a Markov chain.

Mark-off

This week, we’ve seen the limits of discrete probability distributions for modelling sequences, and how Markov chains can solve that issue by modelling probabilistic state dependencies.

I’ll leave you with a couple of tough nuts to crack for next time:

  1. Can you write a Markov chain to describe the ‘red flower red car red car red flower’ sequence?
  2. Can you write a function that automatically creates a transition matrix from the words of the popular nursery rhyme Humpty dumpty?

Humpty Dumpty sat on a wall

Humpty Dumpty had a great fall

All the king’s horses and all the king’s men

couldn’t put Humpty together again

3. Can you use this transition matrix along with an initial distribution to create a new remix of Humpty Dumpty?

Next time we’ll walk through the solution to these tasks (which, by the way, is your first machine learning task) and start to work with datasets.

As always, you can get the source code for this week on our GitHub.

Continue your training to supreme master-level AI coding here.

And give us a follow to receive updates on our latest posts.

--

--

Andy Elmsley
The Sound of AI

Founder & CTO @melodrivemusic. AI video game music platform. Tech leader, programmer, musician, generative artist and speaker.