Markov Chains and Neural Networks

Elijah Jarocki
4 min readMar 28, 2022

--

Markov chains have many applications in chemistry, physics, economics, and mathematics. While recently studying neural networks, I noticed similarities between attributes of Markov chains and the architecture of densely connected neural networks. I wondered how far this similarities extended. After doing some research, I came across a paper entitled Markov Chain Neural Networks” by Maren Awiszus and Bodo Rosenhahn. My goal is to give a cursory introduction of Markov chains to a non-technical audience and to summarize the findings of this paper.

You may read the paper yourself at: https://openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w42/Awiszus_Markov_Chain_Neural_CVPR_2018_paper.pdf

Andrey Markov was a Russian mathematician who studied and defined Markov processes. In short, these are processes in which the future outcome of the process depends solely on the existing state of the variables. Basically, there are probabilities of certain outcomes from state to state that are independent of previous changes in the state. The possible outcomes of the processes are finite and countable. These processes have drastically different characteristics than Bayesian probabilities that can be predicted based on previous statistical states.

A photograph of Andrey Markov
Andrey Markov src: https://mathshistory.st-andrews.ac.uk/Biographies/Markov/

Most current neural networks have a deterministic character. They are trained on sample data and eventually reach a specified, desired outcome. Awiszus and Rosenhahn sought to build a neural network which can simulate Markov chains and be used to make more random decisions in its outcome.

The key idea is to add an additional input node with a random variable which allows the network to use it as a switch node to produce different outcomes. Even though the network is acting in a deterministic fashion, due to the random input it produces random output with guaranteed statistical properties reflected in the training data. — Awiszus and Rosenhahn

Due to the nontechnical nature of this post, I will forgo entering into much detail about the mechanisms with which their neural network was able to achieve such results. If you are interested, documentation is provided in the referenced material.

The authors were able to use their neural network to demonstrate achievement in playing both Tic-Tac-Toe and the game Flappy Bird. When compared against standard neural networks in Tic-Tac-Toe, the Markov Chain based network had better playing performance.

The neural net also contained more “human” behavior due to the random nature of its architecture: instead of always playing the best move, sometimes it would make a more random choice just as you would expect from casually playing someone in Tic-Tac-Toe.

A Markov Chain Neural Network playing the iPhone app “Flappy Bird”

While testing the neural network on Flappy Bird:

This paradigm can also be applied in the context of reinforcement learning to balance possible reactions to their overall gain. In Q-learning an agent transitions between states, executes actions and gains a reward to be optimized. The non deterministic behavior of a Markov chain neural network can be easily integrated in an agent to explore the state space of a game. The rewards are correlated to the impact of an action, so that more successful activities appear more often in the training data and are thus more likely to be selected. Figure 8 shows three screen shots of a neural network which perfectly plays the game Flappy Bird, in which the inputs consist of the proposed random value, the position of the bird, the position of the pipes and the distance between the bird and the pipes. —Awiszus and Rosenhahn

Finally, they used their neural network to reconstruct images given just a portion of the initial image. The random nature of their network was able to recreate facial emotions independent of the initial (withheld) image. Their results are shown below:

Test results of a Markov Chain Neural Network recreating an image of a face with different emotions as an output.

As you can see, Markov processes are not only similar to neural networks, but can actually be implemented to make neural networks more effective and natural. These techniques could be further applied in natural language processing, image recognition, game theory, and other fields. Making neural networks more natural can both model the real world more effectively and make our technology more lifelike.

Thank you for reading!

--

--

Elijah Jarocki

Mathematician working in Data Science to solve complex problems.