Hidden Markov Models — Part 1: Model Description

Mehran Ghamaty
4 min readOct 25, 2023
Courtesy of Wikipedia

Which is to say a posteriori makes up the most likely sequence of events which got us here given the assumption the system is Markovian. Which works well in most cases, the issue being if you don’t make that assumption the problem becomes close to impossible to compute with traditional computers.

Lets say we have a pair of stochastic processes Xn and Yn which are considered to be discrete stochastic processes. Which assumes the observation xn ∈ Xn gets taken at fixed intervals where n is the specific interval, another assumption which is “good enough” in most cases. Although if you know the underlying model apriori is extremely wasteful. For the sake of this article we are only using discrete processes not continuous processes.

The joint distribution of Xn and Yn then form a hidden Markov Model; (Xn, Yn). We typically state Xn is the set non-directly observable states, and Yn is our set of observations (flipped from most “traditional” ML courses, although semantically it makes more sense).

We then would like to calculate the probability of viewing the observations Yn given unseen “hidden” state Xn. P(Yn | Xn), in order to formulate a problem from this expression we have provide an objective function, which is to say we would like to find the Xn which maximizes the probability of what we have seen.

To make this problem more well defined;

  1. We are given a set of observations.
  2. We wish to use a HMM as the assumption of the underlying model which generates our observations.
  3. We make assumptions about the number of hidden states which is used to generate a state machine. These can correspond to semantically meaningful states; such as a patient sleeping, tired, waking up, alert, or dead.
  4. We also have to make assumptions about the initial state of hidden variates and the transition probabilities. If we know the person in the surgery room is asleep we can start with all the probability mass at that state. We can guess at the values of the transitions from domain expertise.

Lets define our transition matrix as Q, which describes movement between our hidden states and our emission matrix as P which describes the probabilities for our observations given our hidden state.

Now lets say we have a set of blood pressure observations. Z = {30, 40, 30, 50, 20}, where Z is an ordered set of diastolic measurements. We also have to make some assumptions if we would ever like to use this in practice (firstly I won’t use this without some additional sensor inputs at the very least Systolic readings). And I am not an anesthesiologist so please take this as an example not a real world application, and I’m not an HMM expert so please don’t take my advise. Lets make our lives slightly easier and discretize our observations to {“low”, “medium”, “high”}.

After passing some arbitrary step function we get our measurements Y = {“medium”, “medium”, “high”, “low}, or for brevity {‘m’, ‘m’, ‘h’, ‘l’}. Taking the template from the Wikipedia article our “model” can be described as follows:

states = ("sleeping", "waking_up", "tired", "alert", "dead")

observations = ("low", "medium", "high")

start_probability = {"sleeping": 0.9, "waking_up": 0.1, "tired" : 0.0, "alert" : 0.0, "dead": 0.0}

transition_probability = {
"sleeping": {"sleeping": 0.9, "waking_up": 0.1, "tired" : 0.0, "alert" : 0.0, "dead": 0.0},
"waking_up": {"sleeping": 0.5, "waking_up": 0.4, "tired" : 0.1, "alert" : 0.0, "dead": 0.0},
"tired": {"sleeping": 0.5, "waking_up": 0.3, "tired" : 0.1, "alert" : 0.1, "dead": 0.0},
"alert": {"sleeping": 0.1, "waking_up": 0.2, "tired" : 0.3, "alert" : 0.3, "dead": 0.1},
"dead": {"sleeping": 0.0, "waking_up": 0.0, "tired" : 0.0, "alert" : 0.0, "dead": 1.0},
}

emission_probability = {
"sleeping": {"low": 0.8, "medium": 0.2, "high": 0.0},
"waking_up": {"low": 0.6, "medium": 0.3, "high": 0.1},
"tired": {"low": 0.3, "medium": 0.4, "high": 0.3},
"alert": {"low": 0.3, "medium": 0.3, "high": 0.4},
"dead": {"low": 1.0, "medium": 0.0, "high": 0.0},
}

Although we tend to prefer using matrices for the sake of computation. Having this representation makes things slightly more readable. Which is also why I refrained from using Enums and functions.

We are given our observations, but have to supply the assumed transition values, the start probability and the emission probabilities. The only values we would be provided in the example would be observations.

--

--