# Recording States of a Chain — Markov Chain Monte Carlo

In our last blog, we gave a very broad overview of the history of the Monte Carlo Simulation (MCS) and how it works. There are many different variations and applications of MCS — so many that we really needed more than one blog to delve into them! Today’s blog will begin with a type of algorithm many data scientists have utilized at one point or another — the Markov Chain Monte Carlo (MCMC).

# The Markov Chain

The MCMC will be easiest to convey using a common scenario: imagine you are a tourist traveling around a bustling city visiting various landmarks. Time is precious while on vacation so you plan your route ahead of time each day. How you choose to move from place to place would depend on several factors — proximity, the weather, your interests, the inherent popularity of each place, etc. The paths taken from place to place is determined by considering and weighing these factors which, in turn, determine the resultant path for a day. This is our Markov Chain.

Markov Chains exist inside of a state diagram, where each state is connected by probabilities. State diagrams have a finite number of states that can be thought of as points in space. Each of these states have one or more connections to itself or to other states. In our tourist simulation, each landmark can be thought of as a state — a location that you visit — whereas the road or path is the connection representing the probability that you choose to take that path to the next state. The Markov Chain is the overall path you travel for the day. The intrinsic randomness of state diagrams and the weighted probabilities that it handles can be used for a variety of things outside of this scenario. A few examples include examining migration patterns of animals, running macroeconomic simulations, and even describing frequencies of bound and unbound complexes on the molecular level.

That is — each transition is memory-less and only dependent on the current state and the static probability that you will move from the current state to the next state. If we were to visit the same bustling city a second time in our example, our choices would be influenced by our previous visit (we might be more inclined to visit some entirely new landmarks this time) and this travel path wouldn’t be considered a Markov Chain due to that reason.

Using all of this information taken together, if you were given a state diagram and ran a simulation to see which state you end up in after a limited number of transitions or time steps, you’d expect that some of these states might be visited very frequently, while some states might never be visited, even if there’s a path leading there.

# Markov Chain + Monte Carlo

Markov Chains are perfect for providing a framework for the MCS to work with. Readers from our last blog might remember that the MCS requires a model to act a set of random processes on over and over again to predict a specific pattern or outcome. State diagrams and Markov Chains provide a complex distribution to the MCS and a complicated, inherently uncertain tangle of possibilities to be explored. Putting the two pieces of the puzzle together, MCMC works by creating a dataset of random samples to place in a state diagram and iterates through them until it provides a desired output parameter — the most favorable Markov State. The “most favorable” when considering our example might be the least time spent traveling or most places visited within a given time. If the state diagram is set up accurately, regardless of why the transitions between the states are the way they are, we are presented with the simulation’s “educated guess” of its properties. This is often very close to, if not exactly, the theoretically correct answer. We can explain MCMC by extending our city-traveling analogy further: imagine throwing a bunch of tourists into this city, each with their own unique preferences for traveling between landmarks. With this subtle change in scope, our bustling city is now framed as a Markov Chain Monte Carlo simulation.

# More Subsets

Many methods of the MCS exist — each catered to a particular field or problem. MCMC is one specific subset of MCS, and even this can be subdivided further into various approaches — the Metropolis-Hastings algorithm and Gibbs Sampling, to name a few. Of course, the traveling tourist example is a simplified representation. In most MCMC simulations, the complex distribution trying to be replicated is continuous rather than discrete, making it harder to define the model using specific states that can be visited.

Even so, using a “random walk” approach to sampling makes it possible to converge to the most probable places on the distribution over time. MCMC has proven to be a great way to simplify long, complex calculations.

Interested in molecular simulations, biological art, or learning more about molecules? Subscribe to our Twitter and Instagram!

Written by