AI, probably

Published in

The Sound of AI

5 min readMar 6, 2019

The latest post in our AI coding tutorial series.

Schrödinger’s cat. (No cats where harmed while making this tutorial.)

Hello again, AI coding Padawans. I hope you found the last few posts on search easy to learn yet challenging enough to keep you going. I’d love to hear your feedback so I can improve these tutorials.

So far we’ve been discussing the topic of search, but the breadth-first search algorithm we implemented is hardly ‘intelligent’; the algorithm follows a simple set of rules to reach its goal state. To have the machine make more reasoned ‘choices’, we need to go beyond blindly following these rules. This week we’ll put more of the I into AI with a new topic: stochastic models. Over the next few posts you’ll get to know a simple AI stochastic model known as a Markov chain. By the end you’ll also know how to build one to generate new song lyrics — although it might be obvious that a robot wrote them.

As a reminder, if you’re looking for the solution to last week’s questions, you can find it on our GitHub, along with the full, updated source code for this entire series. All the code includes comments that should tell you everything you need to know. But if you’re stuck or need help, reach out to me on Twitter and I’ll jump right in.

Ok, so what are stochastic models?

You might be unfamiliar with the word ‘stochastic’, but don’t worry — the truth is that ‘stochastic’ is just a clever-sounding way of saying ‘random’. To create a stochastic model then, is to formally define any seemingly random process or system.

Take the weather, for example (a favourite topic for British people like myself). The weather system is highly complex, affected by many factors including the season, prevailing winds and El Niño to name a few. However, over time we can form an idea of how the weather will likely change from one day to the next. For instance, in Berlin (my current home), June averages eight days of rain each year. So, on any one day in June we can say that there’s around a 27% (8/30) chance that it will rain, excluding other influences like drought. We can now define a problem to tackle with AI (trying to predict if it will rain), a data representation (a probability), and use a stochastic model to actually solve it. For our toy example here we can just use a random number generator (RNG) that outputs a number between 0 and 1. If the number is less than or equal to 0.27, the model has predicted it will rain. Ah, feels like home.

Probability states and distributions

Our toy weather example was a gross oversimplification. Let’s consider another simple, yet more realistic example: a coin toss. We have two possible outcomes; heads or tails. As with search, we can call these two outcomes states. We can represent these two states and their associated probabilities with a dictionary:

This is called a discrete probability distribution — where you have a number of specific outcome states and their associated probabilities. We can see from this example that our coin should land on heads half the time, and that the probabilities of all states add up to 1. We can therefore be certain that the coin will land on either heads or tails, and that our distribution is complete.

We can create probability distributions of stochastic processes, like a coin toss, by observing the behaviour over several iterations, and deriving the probability distribution from the results. Simply, the tally for a hundred coin tosses might look something like this: :

To convert this into a useable probability distribution we need to normalise the values, so that the probabilities all add up to 1. This is known as a normalised sum. You can implement it by first summing up all the numbers in the distribution, then dividing each number by that sum.

Below is a normalised sum implementation. I’ve implemented a generic normalise sum function that normalises all the arguments passed into it, and a second function that normalises a dictionary.

Sampling states from distributions

Now that we have a normalised discrete probability distribution, we can use it for prediction. The simplest way to do this is by sampling the distribution — where we randomly select values from it.

Generating random values is easy in most programming languages, and most of the time the implementation will output a random value between 0 and 1. In order to use this value to sample from our distribution, we need to transform the distribution one more time into a cumulative distribution. This is where we iterate through each state, adding up the probabilities as we go. In the implementation below I’m using python’s OrderedDict, so that the order of the values stays the same in the cumulative version (useful for iterating over later).

After conversion to a cumulative distribution, the values will look like this:

Now we can easily sample from the distribution by taking a random number and choosing a state by iterating over the cumulative distribution.

Calling the sample method with a distribution will give us a new random value that follows the distribution.

That’s probably enough

That’s it for this week. We’ve covered stochastic models, discrete distributions, and how to create them from observed values with a normalised sum. Finally, we looked at how we can sample from this distribution to predict values.

Next time we’ll put all these probability tools to good use when we make a (slightly) more sophisticated stochastic model — a Markov chain.

As always, you can get the source code for this week on our GitHub.

To test your newly-found knowledge, take a crack at these challenges:

Create a distribution for a fair dice and simulate throwing the dice ten times.
Create a distribution for a weighted dice that is ten times more likely to land on a 1 than the other faces. Simulate throwing the dice ten times and compare with (1).

Continue your training to supreme master-level AI coding here.

And give us a follow to receive updates on our latest posts.

AI, probably

Ok, so what are stochastic models?

Probability states and distributions

Sampling states from distributions

That’s probably enough

Written by Andy Elmsley