Hidden Markov Model

5 min readAug 31, 2017

Hidden Markov Model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. hidden) states.

Hidden Markov models are especially known for their application in reinforcement learning and temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.

Terminology in HMM

The term hidden refers to the first order Markov process behind the observation. Observation refers to the data we know and can observe. Markov process is shown by the interaction between “Rainy” and “Sunny” in the below diagram and each of these are HIDDEN STATES.

OBSERVATIONS are known data and refers to “Walk”, “Shop”, and “Clean” in the above diagram. In machine learning sense, observation is our training data, and the number of hidden states is our hyper parameter for our model. Evaluation of the model will be discussed later.

T = don’t have any observation yet, N = 2, M = 3, Q = {“Rainy”, “Sunny”}, V = {“Walk”, “Shop”, “Clean”}

State transition probabilities are the arrows pointing to each hidden state. Observation probability matrix are the blue and red arrows pointing to each observations from each hidden state. The matrix are row stochastic meaning the rows add up to 1.

The matrix explains what the probability is from going to one state to another, or going from one state to an observation.

Initial state distribution gets the model going by starting at a hidden state.

Full model with known state transition probabilities, observation probability matrix, and initial state distribution is marked as,

How can we build the above model in Python?

In the above case, emissions are discrete {“Walk”, “Shop”, “Clean”}. MultinomialHMM from the hmmlearn library is used for the above model. GaussianHMM and GMMHMM are other models in the library.

Now with the HMM what are some key problems to solve?

Problem 1, Given a known model what is the likelihood of sequence O happening?
Problem 2, Given a known model and sequence O, what is the optimal hidden state sequence? This will be useful if we want to know if the weather is “Rainy” or “Sunny”
Problem 3, Given sequence O and number of hidden states, what is the optimal model which maximizes the probability of O?

Problem 1 in Python

The probability of the first observation being “Walk” equals to the multiplication of the initial state distribution and emission probability matrix. 0.6 x 0.1 + 0.4 x 0.6 = 0.30 (30%). The log likelihood is provided from calling .score.

Problem 2 in Python

Given the known model and the observation {“Shop”, “Clean”, “Walk”}, the weather was most likely {“Rainy”, “Rainy”, “Sunny”} with ~1.5% probability.

Given the known model and the observation {“Clean”, “Clean”, “Clean”}, the weather was most likely {“Rainy”, “Rainy”, “Rainy”} with ~3.6% probability.

Intuitively, when “Walk” occurs the weather will most likely not be “Rainy”.

Problem 3 in Python

Speech recognition with Audio File: Predict these words

[‘apple’, ‘banana’, ‘kiwi’, ‘lime’, ‘orange’, ‘peach’, ‘pineapple’]

Amplitude can be used as the OBSERVATION for HMM, but feature engineering will give us more performance.

Function stft and peakfind generates feature for audio signal.

The example above was taken from here. Kyle Kastner built HMM class that takes in 3d arrays, I’m using hmmlearn which only allows 2d arrays. This is why I’m reducing the features generated by Kyle Kastner as X_test.mean(axis=2).

Going through this modeling took a lot of time to understand. I had the impression that the target variable needs to be the observation. This is true for time-series. Classification is done by building HMM for each class and compare the output by calculating the logprob for your input.

Mathematical Solution to Problem 1: Forward Algorithm