# Hidden Markov Model

**Hidden Markov Model** (**HMM**) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. *hidden*) states.

Hidden Markov models are especially known for their application in reinforcement learning and temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.

**Terminology in HMM**

The term hidden refers to the first order Markov process behind the observation. Observation refers to the data we know and can observe. Markov process is shown by the interaction between “Rainy” and “Sunny” in the below diagram and each of these are **HIDDEN STATES**.

**OBSERVATIONS** are known data and refers to “Walk”, “Shop”, and “Clean” in the above diagram. In machine learning sense, observation is our training data, and the number of hidden states is our hyper parameter for our model. Evaluation of the model will be discussed later.

T = don’t have any observation yet, N = 2, M = 3, Q = {“Rainy”, “Sunny”}, V = {“Walk”, “Shop”, “Clean”}

**State transition probabilities** are the arrows pointing to each hidden state. **Observation probability matrix** are the blue and red arrows pointing to each observations from each hidden state. The matrix are row stochastic meaning the rows add up to 1.

The matrix explains what the probability is from going to one state to another, or going from one state to an observation.

**Initial state distribution** gets the model going by starting at a hidden state.

Full model with known state transition probabilities, observation probability matrix, and initial state distribution is marked as,

**How can we build the above model in Python?**

In the above case, emissions are discrete {“Walk”, “Shop”, “Clean”}. MultinomialHMM from the hmmlearn library is used for the above model. GaussianHMM and GMMHMM are other models in the library.

**Now with the HMM what are some key problems to solve?**

**Problem 1**, Given a known model what is the likelihood of sequence O happening?**Problem 2**, Given a known model and sequence O, what is the optimal hidden state sequence? This will be useful if we want to know if the weather is “Rainy” or “Sunny”**Problem 3**, Given sequence O and number of hidden states, what is the optimal model which maximizes the probability of O?

**Problem 1 in Python**

The probability of the first observation being “Walk” equals to the multiplication of the initial state distribution and emission probability matrix. 0.6 x 0.1 + 0.4 x 0.6 = 0.30 (30%). The log likelihood is provided from calling .score.

**Problem 2 in Python**

Given the known model and the observation {“Shop”, “Clean”, “Walk”}, the weather was most likely {“Rainy”, “Rainy”, “Sunny”} with ~1.5% probability.

Given the known model and the observation {“Clean”, “Clean”, “Clean”}, the weather was most likely {“Rainy”, “Rainy”, “Rainy”} with ~3.6% probability.

Intuitively, when “Walk” occurs the weather will most likely not be “Rainy”.

**Problem 3 in Python**

Speech recognition with Audio File: Predict these words

[‘apple’, ‘banana’, ‘kiwi’, ‘lime’, ‘orange’, ‘peach’, ‘pineapple’]

Amplitude can be used as the OBSERVATION for HMM, but feature engineering will give us more performance.

Function **stft** and **peakfind** generates feature for audio signal.

The example above was taken from here. Kyle Kastner built HMM class that takes in 3d arrays, I’m using hmmlearn which only allows 2d arrays. This is why I’m reducing the features generated by Kyle Kastner as X_test.mean(axis=2).

Going through this modeling took a lot of time to understand. I had the impression that the target variable needs to be the observation. This is true for time-series. **Classification is done by building HMM for each class** and compare the output by calculating the logprob for your input.

**Mathematical Solution to Problem 1: Forward Algorithm**

Alpha pass is the probability of OBSERVATION and STATE sequence given model.

Alpha pass at time (t) = 0, initial state distribution to i and from there to first observation O0.

Alpha pass at time (t) = t, sum of last alpha pass to each hidden state multiplied by emission to Ot.

**Mathematical Solution to Problem 2: Backward Algorithm**

Given model and observation, probability of being at state qi at time t.

**Mathematical Solution to Problem 3: Forward-Backward Algorithm**

Probability of from state qi to qj at time t with given model and observation

Sum of all transition probability from i to j.

Transition and emission probability matrix are estimated with di-gamma.

Iterate if probability for P(O|model) increases