Hidden Markov Models: The Secret Sauce in Natural Language Processing

6 min readDec 30, 2023

Part 5: Diving into Hidden Markov Models: Steps and Fundamental Problems

The HMM algorithm can be implemented using the following steps:

Define the state space and observation space: The state space refers to the possible hidden states in the system, while the observation space refers to the possible observable outputs or emissions corresponding to each hidden state. In an HMM, these states are “hidden” and are not directly observable. Instead, the observations are made based on these hidden states
Initialize the parameters of the state transition probabilities and observation likelihoods : These parameters represent the likelihood of moving from one state to another (state transition probabilities) and the likelihood of making a specific observation given a certain state (observation likelihoods). Often, these are initialized randomly or based on prior knowledge about the system.
Train the model using the Baum-Welch algorithm or the forward-backward algorithm.
Decode the most likely sequence of hidden states using the Viterbi algorithm.
Evaluate the performance of the model: After training and decoding, it’s crucial to assess how well the model performs. This involves comparing the model’s predictions against actual data. This includes using metrics like accuracy, precision, recall, or other domain-specific measures to evaluate how well the HMM predicts or matches the hidden states against the observed data.

Fundamental HMM modelling problems: Hidden Markov Models (HMMs) are used to model systems with hidden states, and they can be applied to solve various types of problems. In the context of Hidden Markov Models (HMMs), there are three fundamental problems of interest. i.e. Hidden Markov Models (HMMs) are used to solve three types of problems:

Evaluation Problem: Also known as the likelihood problem. The evaluation problem is concerned with determining the likelihood of observing a specific sequence given the parameters of the HMM. Given an observation sequence and an HMM model, compute the probability of the observation sequence given the HMM model. This problem involves determining the probability of observing a particular sequence given the model parameters. In other words, given a sequence of observations and an HMM, we want to compute the probability that the observed sequence was generated by the model. The Forward-Backward algorithm can be used to compute this probability. Given an HMM model for weather prediction (Sunny, Cloudy, Rainy) and an observed sequence [Walk, Shop, Walk, Walk, Shop], we can use the forward-backward algorithm to compute the probability of this observation sequence occurring in the model

Question answered by Evaluation problem: For example, consider a simple weather prediction model where the hidden states are “Sunny” and “Rainy”, and the observations are different activities like “walking”, “shopping”, “cleaning”, etc. Now, given a sequence of activities, the evaluation problem asks: what is the probability that this sequence of activities was generated by the weather model? Here’s another example: if we have a weather model with ‘Rainy’ and ‘Sunny’ as states and ‘Umbrella’ and ‘No Umbrella’ as observations, and we observe the sequence [‘Umbrella’, ‘No Umbrella’, ‘Umbrella’], we might want to know the probability of this sequence occurring given the model

Decoding Problem: The decoding problem aims to find the most likely sequence of hidden states that generated a given sequence of observations. Given an observed sequence and the parameters of an HMM, the goal is to identify the most probable sequence of hidden states that could have led to the observed sequence. Given an observation sequence and an HMM model, compute the most likely sequence of hidden states that explains the observation sequence. The goal is to find the most probable sequence of hidden states (e.g., weather conditions) that could produce the observed sequence of observations (e.g., activities). Let’s consider another example- For instance, if you have an HMM representing a person’s mood (states: Happy, Sad, Anxious) and a sequence of actions (observations: Eat, Read, Sleep), you might want to decode the sequence of actions to find the most likely sequence of moods that led to those actions. This problem is solved effectively by the Viterbi algorithm.

Question answered by Decoding problem: Given an HMM and an observed output sequence, find the most likely sequence of hidden states that produced the output sequence. Continuing with the weather prediction example, suppose we have a sequence of activities like “shopping”, “cleaning”, “biking”, and “painting”. The decoding problem asks: what is the most likely sequence of weather states (Sunny, Rainy) that led to this sequence of activities? For example, given the same weather model and observation sequence as above(in previous example), we might want to know the most likely sequence of weather states (Rainy or Sunny) that led to the observed sequence of using or not using an umbrella.

Learning Problem: The learning problem focuses on estimating the parameters (transition probabilities, emission probabilities, initial state probabilities) of an HMM given an observed sequence. In this case, the parameters of the HMM are unknown, but an observed sequence is available. The goal is to estimate the model’s parameters that maximize the probability of observing that sequence. This involves estimating the parameters of an HMM from observed data. The goal is to find the parameters that make the observed data most likely. For example, given a sequence of weather observations (Rainy, Sunny, Rainy), you might want to learn the parameters of an HMM that represents the weather and best explain the observed data. Given an observation sequence, compute the parameters of an HMM model that maximize the probability of observing the sequence in the model. This problem is usually solved using the Baum-Welch algorithm.

Question answered by Learning problem: In the context of our weather prediction example, suppose we have a large number of activity sequences but we don’t know the weather on those days. The learning problem then asks: how can we adjust our model’s parameters so that the likelihood of these activity sequences under the model is maximized?

Note — Filtering and smoothing are two fundamental operations in Hidden Markov Models (HMMs).

Filtering in HMMs refers to the process of estimating the hidden states at each time step given the observable states up to that point. This is achieved using the Forward algorithm.

On the other hand, smoothing is the process of estimating the hidden states for all time steps given the observable states for all time steps. This is accomplished using the Backward algorithm.

The Forward-Backward algorithm combines these two processes to estimate the most likely sequence of hidden states given the observable states.

Ending note- As we conclude this segment, I’m deeply grateful for your ongoing engagement. But our journey is far from over. In the forthcoming days, we’ll set sail on a new phase in Part 6: The Likelihood Problem in HMM: A Step-by-Step Guide. I am eager to share more insights and deepen our understanding of our topic.

Before we bid adieu, I would like to kindly request you to give this post a clap if you found it helpful. Moreover, if you found this content beneficial, I would be honored if you could share it with others. Once again, thank you for your time and interest. I look forward to continuing this journey with you in next part.

Continue to push boundaries, embrace challenges, and nurture your growth. You’re doing great! Keep going and see you next time!

Hidden Markov Models: The Secret Sauce in Natural Language Processing

Written by om pramod