Markov Models, Basics

Luv Verma
3 min readApr 12, 2023

--

This blog is inspired by the lectures from CS5100, NEU taught by Prof. Smith, and CS188, UCB. I have written/re-written this blog because I am using the concept heavily in deriving the objective function for the DDPM model in my other blog.

For the basics of Bayes Net, refer to my other blog

Markov Models

In this blog, we will discuss the Markov models, which can be thought of as analogous to a chainlike, infinite-length Bayes’ net.

The running example we’ll be working with in this section is the day-to-day fluctuations in weather patterns. Our weather model will be time-dependent (as are Markov models in general), meaning we’ll have a separate random variable for the weather on each day.

If we define Wi as the random variable representing the weather on day i, the Markov model for our weather example would look like this:

What information should we store about the random variables involved in our Markov model? To track how our quantity under consideration (in this case, the weather) changes over time, we need to know both it’s initial distribution at time t = 0 and some sort of transition model that characterizes the probability of moving from one state to another between timesteps.

The initial distribution of a Markov model is enumerated by the probability table given by P(W0) and the transition model of transitioning from state i to i+1 is given by P(W(i+1)|W(i)).

Note that this transition model implies that the value of W(i+1) is conditionally dependent only on the value of W(i) . In other words, the weather at time t = i + 1 satisfies the Markov property or memoryless property, and is independent of the weather at all other timesteps besides t = i.

Using our Markov model for weather, if we wanted to reconstruct the joint between W0, W1, and W2 using the chain rule, we would want:

However, with our assumption that the Markov property holds true and the property of W0 being independent of W2 given W1 represented as:

the joint simplifies to:

And we have everything we need to calculate equation 2 from the Markov model.

More generally, Markov models make the following independence assumption at each timestep:

The above expression allows us to reconstruct the joint distribution for the first n+1 variables via the chain rule as follows (equation 3):

A final assumption that’s typically made in Markov models is that the transition model is stationary. In other words, for all values of i, Pr(W(i+1)|W(i)) is identical.

--

--