Hidden Markov Models with Python
Modelling Sequential Data Stochastically
Out of the several models available for modelling discrete sequential data, hidden Markov models are one of the more simple but powerful ones— they have been applied successfully to a wide variety of fields such as statistical mechanics, DNA analysis, speech recognition and stock market predictions. In this article, we will explore hidden Markov models with Python!
Hidden Markov Processes
The first order Markov process makes a very important simplification to observed sequential data—the current system state depends only on the previous system state.
Additionally, hidden Markov models make one more important modification to the Markov process — the actual system states are assumed to be unobservable and are hidden. For a sequence of hidden states Z, the hidden Markov process emits a corresponding sequence of observable processes X. Using the observed processes X, we try to guess what Z really is using hidden Markov models!
Hidden Markov Models
As mentioned in the previous section, hidden Markov models are used to model a hidden Markov process. Hidden Markov models are defined by the following 3 model parameters:
- Initial hidden state probabilities 𝝅 = [𝝅₀, 𝝅₁, 𝝅₂, …]ᵀ. This vector describes the initial probabilities of the system being in a particular hidden state.
- Hidden state transition matrix A. Each row in A corresponds to a particular hidden state, and the columns for each row contain the transition probabilities from the current hidden state to a new hidden state. For example, A[1, 2] contains the transition probability from hidden state 1 to hidden state 2.
- Observable emission probabilities 𝜽 = [𝜽₀, 𝜽₁, 𝜽₂, …]ᵀ. This vector describes the emission probabilities for the observable process Xᵢ given some hidden state Zᵢ.
Let us next study an extremely simple case study to get a better appreciation of how hidden Markov models work.
Guessing Someone’s Mood
An example of a hidden Markov process is the guessing of someone’s mood. We cannot directly observe or measure the mood of a person (at least without sticking electrodes in the person’s brain), instead we observe his or her facial features, and then try to guess the mood.
We assume that moods can be described as a Markov process, and that there are 2 possible moods — good and bad. We also assume that there are 2 possible observable facial features — smiling and frowning.
Initial Hidden State Probabilities
When we first meet someone, we assume that there is a 70% chance that the person is in a good mood and a 30% chance that the person is in a bad mood. These are our initial probabilities: 𝝅 = [0.7, 0.3]ᵀ, i.e. P(initially good mood) = 0.7 and P(initially bad mood) = 0.3.
Hidden State Transition Matrix
We also assume that when a person is in a good mood, moments later there is a 80% chance that he or she will still be in a good mood, and a 20% chance that he or she will now be in a bad mood. We also assume the same probabilities for the opposite situation in order to simplify the problem. These are the elements of the hidden state transition matrix: A = [[0.8, 0.2], [0.2, 0.8]], i.e. P(good to good) = 0.8, P(good to bad) = 0.2, P(bad to good) = 0.2, P(bad to bad) = 0.8.
Observable Emission Probabilities
Finally, we assume that when a person is in a good mood, there is a 90% chance that he or she will be smiling, and a 10% chance that he or she will be frowning. We also assume the same probabilities for the opposite situation in order to simply the problem. These are the emission probabilities: 𝜽 = [[0.9, 0.1], [0.1, 0.9]], i.e. P(smiling|good mood) = 0.9, P(frowning|good mood) = 0.1, P(smiling|bad mood) = 0.1, P(frowning|bad mood) = 0.9.
Guessing Someone’s Mood from their Facial Features
Now, if for example we observed that for a couple of moments, someone had a facial feature sequence of: [smiling, frowning], we can use the 3 model parameters above to calculate the probabilities of their actual hidden moods:
P([good, good]) = (𝝅[0]·𝜽[0, 0])·(A[0, 0]·𝜽[0, 1]) = (0.7·0.9)·(0.8·0.1) = 0.0504,
P([good, bad]) = (𝝅[0]·𝜽[0, 0])·(A[0, 1]·𝜽[1, 1]) = (0.7·0.9)·(0.2·0.9) = 0.1134,
P([bad, good]) = (𝝅[1]·𝜽[1, 0])·(A[1, 0]·𝜽[0, 1]) = (0.3·0.1)·(0.2·0.1) = 0.0006,
P([bad, bad]) = (𝝅[1]·𝜽[1, 0])·(A[1, 1]·𝜽[1, 1]) = (0.3·0.1)·(0.8·0.9) = 0.0216.
By normalizing the sum of the 4 probabilities above to 1, we get the following normalized joint probabilities:
P’([good, good]) = 0.0504 / 0.186 = 0.271,
P’([good, bad]) = 0.1134 / 0.186 = 0.610,
P’([bad, good]) = 0.0006 / 0.186 = 0.003,
P’([bad, bad]) = 0.0216 / 0.186 = 0.116.
From these normalized probabilities, it might appear that we already have an answer to the best guess: the person’s mood was most likely: [good, bad]. However this is not the actual final result we are looking for when dealing with hidden Markov models — we still have one more step to go in order to marginalise the joint probabilities above.
We calculate the marginal mood probabilities for each element in the sequence to get the probabilities that the 1st mood is good/bad, and the 2nd mood is good/bad:
P’(1st mood is good) = P’([good, good]) + P’([good, bad]) = 0.881,
P’(1st mood is bad) = P’([bad, good]) + P’([bad, bad]) = 0.119,
P’(2nd mood is good) = P’([good, good]) + P’([bad, good]) = 0.274,
P’(2nd mood is bad) = P’([good, bad]) + P’([bad, bad]) = 0.726.
The optimal mood sequence is simply obtained by taking the sum of the highest mood probabilities for the sequence —P’(1st mood is good) is larger than P’(1st mood is bad), and P’(2nd mood is good) is smaller than P’(2nd mood is bad). In this case, it turns out that the optimal mood sequence is indeed: [good, bad].
While this example was extremely short and simple (in order to keep things short), it illuminates the basics of how hidden Markov models work! Everything else is essentially a more complex version of this example, for example, much longer sequences, multiple hidden states or observations.
Three Classes of Problems
Generally speaking, the three typical classes of problems which can be solved using hidden Markov models are:
1. Given a set of observations X and the 3 model parameters 𝝅, A and 𝜽, calculate the occurrence probability of the observations X.
This is the more complex version of the simple case study we encountered above. For a given set of model parameters λ = (𝝅, A, 𝜽) and a sequence of observations X, calculate P(X|λ). This problem is solved using the forward algorithm.
2. Given a set of observations X and the 3 model parameters 𝝅, A and 𝜽, determine the optimal set of hidden states Z that result in X.
For a given set of model parameters λ = (𝝅, A, 𝜽) and a sequence of observations X, calculate the maximum a posteriori probability estimate of the most likely Z. This problem is solved using the Viterbi algorithm.
3. Given only a set of observations X, determine the optimal set of model parameters 𝝅, A and 𝜽.
For a sequence of observations X, guess an initial set of model parameters λ = (𝝅, A, 𝜽) and use the forward and Viterbi algorithms iteratively to recompute P(X|λ) as well as to readjust λ. The calculations stop when P(X|λ) stops increasing, or after a set number of iterations. This problem is solved using the Baum-Welch algorithm.
The mathematical details of the algorithms are rather complex for this blog (especially when lots of mathematical equations are involved), and we will pass them for now — the full details can be found in the references. Instead for the time being, we will focus on utilizing a Python library which will do the heavy lifting for us: hmmlearn
.
The hmmlearn Library
hmmlearn
is a Python library which implements Hidden Markov Models in Python! hmmlearn
provides three models out of the box — a multinomial emissions model, a Gaussian emissions model and a Gaussian mixture emissions model, although the framework does allow for the implementation of custom emissions models.
The Multinomial Emissions Model
The multinomial emissions model assumes that the observed processes X consists of discrete values, such as for the mood case study above.
We will next take a look at 2 models used to model continuous values of X.
The Gaussian Emissions Model
The Gaussian emissions model assumes that the values in X are generated from multivariate Gaussian distributions (i.e. N-dimensional Gaussians), one for each hidden state. Each multivariate Gaussian distribution is defined by a multivariate mean and covariance matrix.
hmmlearn
allows us to place certain constraints on the covariance matrices of the multivariate Gaussian distributions.
covariance_type = “diag”
— the covariance matrix for each hidden state is a diagonal matrix, and these matrices may be different.covariance_type = “spherical”
—the covariance matrix for each hidden state is proportional to the identity matrix, and these matrices may be different.covariance_type = “full”
—no restrictions placed on the covariance matrices for any of the hidden states.covariance_type = “tied”
—all hidden states share the same full covariance matrix.
Gaussian Mixture Emissions Model
This is the most complex model available out of the box. The Gaussian mixture emissions model assumes that the values in X are generated from a mixture of multivariate Gaussian distributions, one mixture for each hidden state. Each multivariate Gaussian distribution in the mixture is defined by a multivariate mean and covariance matrix.
As with the Gaussian emissions model above, we can place certain constraints on the covariance matrices for the Gaussian mixture emissiosn model as well.
covariance_type = “diag”
—for each hidden state, the covariance matrix for each multivariate Gaussian distribution in the mixture is diagonal, and these matrices may be different.covariance_type = “spherical”
— for each hidden state, the covariance matrix for each multivariate Gaussian distribution in the mixture is proportional to the identity matrix, and these matrices may be different.covariance_type = “full”
— no restrictions placed on the covariance matrices for any of the hidden states or mixtures.covariance_type = “tied”
— for each hidden state, all mixture components use the same full covariance matrix. These matrices may be different between the hidden states, unlike in the Gaussian emission model.
Modeling Historical Gold Prices with hmmlearn
As an application example, we will analyze historical gold prices using hmmlearn,
downloaded from: https://www.gold.org/goldhub/data/gold-prices.
We import the necessary libraries as well as the data into python, and plot the historical data. We also calculate the daily change in gold price and restrict the data from 2008 onwards (Lehmann shock and Covid19!). In general dealing with the change in price rather than the actual price itself leads to better modeling of the actual market conditions.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from hmmlearn import hmm
base_dir = "https://github.com/natsunoyuki/Data_Science/blob/master/gold/gold/gold_price_usd.csv?raw=True"
data = pd.read_csv(base_dir)
# Convert the datetime from str to datetime object.
data["datetime"] = pd.to_datetime(data["datetime"])
# Determine the daily change in gold price.
data["gold_price_change"] = data["gold_price_usd"].diff()
# Restrict the data to later than 2008 Jan 01.
data = data[data["datetime"] >= pd.to_datetime("2008-01-01")]
# Plot the daily gold prices as well as the daily change.
plt.figure(figsize = (15, 10))
plt.subplot(2,1,1)
plt.plot(data["datetime"], data["gold_price_usd"])
plt.xlabel("datetime")
plt.ylabel("gold price (usd)")
plt.grid(True)
plt.subplot(2,1,2)
plt.plot(data["datetime"], data["gold_price_change"])
plt.xlabel("datetime")
plt.ylabel("gold price change (usd)")
plt.grid(True)
plt.show()
Instead of modeling the gold price directly, we model the daily change in the gold price — this allows us to better capture the state of the market. We fit the daily change in gold prices to a Gaussian emissions model with 3 hidden states. The reason for using 3 hidden states is that we expect at the very least 3 different regimes in the daily changes — low, medium and high votality.
# Use the daily change in gold price as the observed measurements X.
X = data[["gold_price_change"]].values# Build the HMM model and fit to the gold price change data.
model = hmm.GaussianHMM(n_components = 3, covariance_type = "diag", n_iter = 50, random_state = 42)
model.fit(X)# Predict the hidden states corresponding to observed X.
Z = model.predict(X)
states = pd.unique(Z)
We find that the model does indeed return 3 unique hidden states. These numbers do not have any intrinsic meaning —which state corresponds to which volatility regime must be confirmed by looking at the model parameters.
print("Unique states:")
print(states)Unique states:
[0 1 2]
We find that for this particular data set, the model will almost always start in state 0.
print("\nStart probabilities:")
print(model.startprob_)Start probabilities:
[1.00000000e+00 4.28952054e-24 1.06227453e-46]
The transition matrix for the 3 hidden states show that the diagonal elements are large compared to the off diagonal elements. This means that the model tends to want to remain in that particular state it is in — the probability of transitioning up or down is not high.
print("\nTransition matrix:")
print(model.transmat_)Transition matrix:
[[8.56499275e-01 1.42858023e-01 6.42701428e-04]
[2.43257082e-01 7.02528333e-01 5.42145847e-02]
[1.33435298e-03 1.67318160e-01 8.31347487e-01]]
Finally, we take a look at the Gaussian emission parameters. Remember that each observable is drawn from a multivariate Gaussian distribution. For state 0, the Gaussian mean is 0.28, for state 1 it is 0.22 and for state 2 it is 0.27. The fact that states 0 and 2 have very similar means is problematic — our current model might not be too good at actually representing the data.
print("\nGaussian distribution means:")
print(model.means_)Gaussian distribution means:
[[0.27988823]
[0.2153654 ]
[0.26501033]]
We also have the Gaussian covariances. Note that because our data is 1 dimensional, the covariance matrices are reduced to scalar values, one for each state. For state 0, the covariance is 33.9, for state 1 it is 142.6 and for state 2 it is 518.7. This seems to agree with our initial assumption about the 3 volatility regimes — for low volatility the covariance should be small, while for high volatility the covariance should be very large.
print("\nGaussian distribution covariances:")
print(model.covars_)Gaussian distribution covariances:
[[[ 33.89296208]]
[[142.59176749]]
[[518.65294332]]]
Plotting the model’s state predictions with the data, we find that the states 0, 1 and 2 appear to correspond to low volatility, medium volatility and high volatility.
plt.figure(figsize = (15, 10))
plt.subplot(2,1,1)
for i in states:
want = (Z == i)
x = data["datetime"].iloc[want]
y = data["gold_price_usd"].iloc[want]
plt.plot(x, y, '.')
plt.legend(states, fontsize=16)
plt.grid(True)
plt.xlabel("datetime", fontsize=16)
plt.ylabel("gold price (usd)", fontsize=16)
plt.subplot(2,1,2)
for i in states:
want = (Z == i)
x = data["datetime"].iloc[want]
y = data["gold_price_change"].iloc[want]
plt.plot(x, y, '.')
plt.legend(states, fontsize=16)
plt.grid(True)
plt.xlabel("datetime", fontsize=16)
plt.ylabel("gold price change (usd)", fontsize=16)
plt.show()
From the graphs above, we find that periods of high volatility correspond to difficult economic times such as the Lehmann shock from 2008 to 2009, the recession of 2011–2012 and the covid pandemic induced recession in 2020. Furthermore, we see that the price of gold tends to rise during times of uncertainty as investors increase their purchases of gold which is seen as a stable and safe asset.
Concluding Remarks
In this article we took a brief look at hidden Markov models, which are generative probabilistic models used to model sequential data. We reviewed a simple case study on people’s moods to show explicitly how hidden Markov models work mathematically. We then introduced a very useful hidden Markov model Python library hmmlearn
, and used that library to model actual historical gold prices using 3 different hidden states corresponding to 3 possible market volatility levels.
References
[1] C. M. Bishop (2006), Pattern Recognition and Machine Learning, Springer.
[2] Mark Stamp (2021), A Revealing Introduction to Hidden Markov Models, Department of Computer Science San Jose State University.
[3] https://hmmlearn.readthedocs.io/en/latest/.