Implementing Markov Chain in Swift. Theory.

A specific focus of stochastic processes theory are so-called Markov processes. They were first introduced by a Russian mathematician Andrey Markov, who built a concept of tracking state transitions in terms of the theory of probability and presented it on January 23, 1913. In order to illustrate his research, Markov used one of the most famous Russian novels, Eugene Onegin, as a dataset and studied the probabilities of consonants and vowels alternation in its first chapters. It’s interesting that the mathematician perhaps couldn’t predict all the possible domains his new theory would be used in years later; however, playing with letters and texts generation is still one of the funniest ways to study Markov processes. Chatbots, Twitterbots and Reddit bots form new texts based on forum talks, memes, scientific articles, classic literature and other models, giving surprisingly readable results.

Stochastic processes

Whenever we say “There’s stochastic process currently happening in the system”, that means that a random process is happening in the something we had previously defined as a system. For example, I am writing this post drinking matcha latte, and Brownian motion (which is also a Markov process, by the way) is happening within my glass (which, of course, I can consider a system) while matcha particles collide with soy milk particles.
The most simple way to define types of Markov processes is to first classify stochastic processes depending on system’s time and space parameters.
Consider a system and a set of its possible states of a number that can be discrete or continuous. This system may have only one state at a time, but it also changes its state at some point in time (t); these state transitions are called steps.
 If each next step may happen through defined and countable time values, we’re dealing with a discrete-time process. If, alternatively, the next step is undefined (there is a continuum of times t), it’s a continuous-time stochastic process. Hereby, we can divide stochastic processes into 4 basic types, although some complex real-life processes may be of “mixed” type:

  • discrete-time processes on a finite set of states
  • discrete-time processes on an infinite set of states
  • continuous-time processes on a finite set of states
  • continuous-time processes on an infinite set of states

Markov processes

A random process taking place in a system with a finite set of states is called Markov process, if at any timepoint tₓ the probability of next state transition depends solely on the system’s current state and not on its past behaviour. The above classifications of stochastic processes are as well applied to Markov processes.

Although not dependent on them, a process may have a “memory” of previous states and is then defined as higher-order Markov process.

Markov chains

A Markov chain is a Markov process taking place within a discrete-valued system. It is usually defined by:

  • a set of states S = {S₀, S₁, S₂, S₃ …Sₓ}
  • an initial distribution, which is a probability of a Markov chain being in state Sᵢ at time t₀
  • a transition probability matrix P = {Pᵢⱼ}, that evaluates the probability of moving from current state to the next one and thus determines the system’s behaviour

The discrete-valued system has a set of states that form an event space. All the events in the space set are mutual exclusive (as mentioned before, the system may be in one state at a time point) and collectively exhaustive (the system is to and will move to one of the states sooner or later). According to the basics of probability theory, the probability of the event space is equal to 1, and so is the sum of probabilities of moving from one step to another:

Σⱼ=1…n Pᵢⱼ =1

A recurrent state has the property that a Markov chain starting at this state returns to this state infinitely often, with probability 1. A transient state has the property that a Markov chain starting at this state returns to this state only finitely often, with probability 1.

Markov chain as weighted graph

Let’s limit the set of states to 4, so S = {S₁, S₂, S₃, S₄}. The system may be presented as a weighted graph with states as nodes and weights as probabilities of moving from one state (node) to another.

Markov chain as weighted graph

States and classes of Markov chain

Let’s take a look at the graph. We can see that there is a non-zero probability of moving from S₁ to S₂; S₂ is therefore accessible from S₁. But there’s also a non-zero probability of going from S₂ back to S₁, that is to say, S₁ and S₂ are communicating. Communicating states form classes (just a concept used to break the system into smaller subsets for better understanding).
Classes may be closed or communicating. This definition is easy to understand: a closed class is a class where system “stucks” with no probability to move to other classes. Otherwise, the class is communicating. If there’s only one state within a closed class, such state is absorbing.
A Markov chain where all the states are communicating and thus form one class is called irreducible Markov chain.
Another characteristic of a state is its recurrency (or transiency). A state is recurrent if the chain keeps returning to it with the probability that equals 1, otherwise, it’s transient.

Building transition probability matrix

Let’s get back to the graph above and build transition probability matrix for the given chain, considering the initial distribution of:

p⁰ = {1, 0, 0, 0}

An exampe of transition probability matrix

Let’s assume the given chain has the same probability distribution for every step (such Markov chain is called time-homogeneous). The transition probability at n-th step will then equal to Pⁿ. A stochastic vector (vector consisting of probabilities of a system to be at particular state at n-th step)is computed using the formula below:

pⁿ = p⁰×Pⁿ

For example, within a given system:

p² = {1, 0, 0, 0} * P⁰ = {0, 1, 0, 0}, so at the next step the system will be at S₂.

Fields of application and sources

Markov chains are well studied and widely used in biology, sociology, economics, natural language processing etc. For details on that, I strongly recommend reading these works:

  • “The Five Greatest Applications of Markov Chains” by P. von Hilers and A.N. Langville
  • “Understanding Markov chains” by N. Privault

And for better and deeper understanding of the theory review these:

The second post in the series shows how texts can be generated based on Markov chains algorithm, so, if you’re interested in Swift implementation, stay tuned!