Practical Swift Markov Model — Get the best out of ML 🏗

Artificial Intelligence -> Machine Learning -> Deep Learning -> Natural Language Processing. All broad topics and largely discussed nowadays. It’s not the goal of this article to discuss all of them (despite each one is actually an inner circle of the previous one, which means we could), let’s talk about something that lives in the heart of each of those, a fundamental technique of any Machine Learning based on a sophisticated mathematical model called Markov Model.

  • In this article I’ll present a small library made in Swift;
  • Show how to use it to create a Text Generator;
  • Show how to implement your own, in case you want to.

Introduction to Markov Model 🛣

For those of you that never heard about Markov Model, here is the smallest and most concise description:

It is a mathematical technique that given a known set of states, can predict future states based on the current state. Using the probabilistic forecasting technique.

Instead of considering the chain of previous events in every single prediction (which would intractable), the model defines what is called the Markov Chain, which is a squared matrix of all the possible states. This matrix consists of a set of probabilities from one state to another. So using a quick and straightforward Vector x Matrix calculation, the model can calculate any future state. By “any future state” I mean not just the very next one, but any given state in the future.

For example, let’s say in a given city, we know that when is sunny there is 60% chance of the next day to be sunny as well, 30% chance to be cloudy and 10% chance to be rainy. The Markov Model describes a relationship between every state in the system, even though this relation is zero.

This is a visual representation of a Markov Chain

The Markov Model also has something called “Hidden Model”, which assumes you can’t observe the real state, but instead you rely on another observation (partially observable). In the weather system, for example, consider all the information we have is a person on the street and what they are wearing umbrella, coat or sunglasses. In the mathematical side, those new observable states are mainly a matrix, and in the end, we’ll have an equation of Vector x Matrix x Matrix.

Markov Model is just one of the underlying techniques of topics like Machine Learning. For example, the keyboard of our mobile devices uses the Markov Model to suggest you the next word. From time to time the internal Markov Chain is updated.

Enough of theory, let’s dive into Swift.

Applications 📱

This Swift Markov Model library can be used to achieve goals involving the state prediction. Like a text generator, words, letters, game plays, numbers, etc. Anything that requires a known pattern of states and you want to start predicting the future states.

A simple example is the keyboard prediction. Markov Model can easily give us a set of possible next words based on the last word typed by the user. E.g.:

  • Given the word “Hello” the model can predict “I”, “my”, or “👋”
  • Given “I” the model can predict “wanna”, “just”, or “can”
  • Given “can” the model can predict “just”, “not”, or “do”

The iOS does it combined with other techniques too, like dictionary and spelling correction, etc. There are many other applications for Markov Model, like in games, for predicting play movements, improving the AI of NPCs and more.

Here we’ll train our model with entire books and then start playing with text generation. Each book contains thousands of connections between words, which means the lexical and syntax will be preserved. We’ll keep aside the meaning because semantic analyses is a topic for Deep Learning and it’s not our focus here. Also, keep in mind that this is just one of many techniques applied to Machine Learning.

Predicting words from a book 📙

We’ll be using Playground for this example. All the results can be found at Swift Markov Model, you can download and try all the samples. Our focus here is to highlight the parts interacting with the Markov Model. So in this exercise, we’ll be using books from https://archive.org/, such as:

'Black Riders!' cried Frodo. 'Where?'
'Here. In the village. I stayed indoors for an hour. Then as you did not come back, I went out for a stroll. I had come back again and was standing just outside the light of the lamp looking at the stars. Suddenly I shivered and felt that something horrible was creeping near: there was a son of deeper shade among the shadows across the road, just beyond the edge of the lamplight. It slid away at once into the dark without a sound. There was no horse.'
J.R.R. Tolkien. (2003). The Lord of the Rings: The Return of the King.

As we know this is the format, and our goal is to get a sequence of words, we can clean it up and create a new model using it as our base of transitions.

Under the hood, what just happened was the matrix construction, let’s have a close look at the word “I” for example:

"I" -> "stayed" = 1
"I" -> "went" = 1
"I" -> "had" = 1
"I" -> "shivered" = 1

So know if we give the word “I” the only 4 possible next words would be “stayed”, “went”, “had” or “shivered”. This will happen thousands and thousands of times in a full book, building a huge and complex matrix in this sense.

Once we have this, it’s time to compose our text, one word at a time.

You may have noticed that the above code is using a static method with closure, this is done due to performance reasons, once each book may contain +200.000 words. However the library also offers other options, you can check it out. So refactoring the first part now we have:

Playing with the Decision Process 🚦

We can play with 3 different kinds of decision processes:

  • predict: Predicts the most likely next state, it is precise. If two or more states have the same probability, the last one will be taken.
  • random: Completely random among the possible future states. Which means only states with at least some probability will be considered.
  • weightedRandom: Same as random, but consider the weight of the probability. For example, given A -> B = 80% and A -> 20%, this method will most likely produce B 80% of the time you call it.

Name Generator 🤖

If you’re interested in the subject, I encourage you to try the example in the playground and play with its variables (all highlighted as “Experiment”). It can produce funny names. The idea is to create/invent new names given the standard patterns in a given language.

I need my own implementation (just give me the code! 😤)

Ok ok…

Sometimes we just need to some small piece of code to copy into our project and modify it accordingly to our needs, without importing a new library, that’s totally fine.

Remember to like n share the article, if you’ve enjoyed. Cya!