Unsupervised Machine Learning with Gaussian Mixture Models

6 min readMar 29, 2020

fig 1) Multi modal Gaussian Distribution

Gaussian Mixture Model is a form of density estimation, which gives the approximation about the probability density on our data. Gaussian Mixture Models are most suitable when our data is multi modal. In the above probability density graph you can see 3 bumps which made by the 3 initial Gaussian distributions.Simply which can implement as sum of weighted Gaussian.

fig 2) sum of weighted Gaussian distribution

According to above equation the whole distribution depend on the weight of each Gaussian pi(k). the sum of all pi values should be equal to 1. Instead of using pi for the weight we can use a latent variable(Z), because sum of all pi values equal to 1.which can be denote as P(Z = k) = pi(k).

Now see, how can we build and train a Gaussian Mixture Model. In algorithms like K means Clustering it will decide the the membership of the data point by just comparing the distances to each cluster centers, but that can generate errors in the model.See the following example how K means can fail.( I assume that you have bit knowledge on Hard k means clustering.)

fig 3) The occasions that hard k means can be fail.

In this example there are 2 cluster points ( yellow and purple) and one test point( blue). In K means it will directly assign the test point to purple cluster point, this will lead to generate some error because you can clearly understand that it’s better if both clusters can share the test point . So in this particular purpose we introduce a fuzzy membership for each class and it can be calculated the responsibilities of each class ( in Gaussian Mixture Model , each Gaussian distribution) on such a data point. here is the way how we can calculate responsibilities.

For each distribution we calculate the responsibilities as in the above equation. we calculate the weight of a particular Gaussian density estimation and get the average through all the density estimation, but other than pi you need to calculate 2 more parameters. The covariance matrix and the mean of each distribution.Let’s see how to calculate above 2 parameters.

Responsibility matrix(R) has a shape of (N, K) where N is the number of data points and K is the number of Gaussian distributions in the model. The Mean distribution matrix(Mu) has shape of (K, f) where f is number of features in the data points. The covariance matrix(Cov) has shape of (K, f , f ) and the initial distribution pi has the shape of length K vector.

So after calculating all these values 1 iteration has been completed. So repeat the process for certain epochs until cost converges. The cost function is calculate by get the summation over the log value of Responsibility matrix.

Now let’s walk through how we can code, simple Gaussian Mixture Model on our own data.

Let’s Build The Model

For this sample code I just created 1200 points with exact 2 feature, because of visualization purposes.This multi modal contain 4 Gaussian distributions and using the value m you can shift the 4 distribution in your own way. in this case I just used m=4. After creating data we can plot them and you can see the fig 6 that contain the distribution of the data.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.utils import shuffle
from scipy.stats import multivariate_normalclass GaussianMixtureModels(object):
   def __init__(self,K):
      self.f = 2    # no: of features
      self.N = 1200 # dataset size
      self.K = K    # cluster centers
      self.minimal_cost_difference = 0.1
      self.smoothing = 1e-2
 
   def create_data(self, m):
      X = np.zeros((self.N, self.f))
      mu1 = np.array([0,0])
      mu2 = np.array([m,m])
      mu3 = np.array([m,0])
      mu4 = np.array([0,m])      X[:300,] = np.random.randn(300,self.f) + mu1
      X[300:600,] = np.random.randn(300,self.f) + mu2
      X[600:900,] = np.random.randn(300,self.f) + mu3
      X[900:1200,] = np.random.randn(300,self.f) + mu4      plt.scatter(X[:,0],X[:,1])
      plt.title('before_apply_GMM')
      plt.savefig('before_apply_GMM.png')
      plt.show()      self.X = X

fig 6) Data distribution before apply GMM

As the first step it needs to initalize all the paramaters( Mean,Covariance,pi ). From all 1200 data points choose K sets of points randomly and assign them as the mean values and then in similar way assign initialize Covariance matrix but instead of random points we just initialize it into identity matrix. The initial weight distribution pi initialize with the uniform probability of 1/K

def initialize_param(self):
     Mu = np.zeros((self.K, self.f))
     Cov = np.zeros((self.K, self.f, self.f))
     for k in range(self.K):
         idx = np.random.choice(self.N)
         Mu[k] = self.X[idx]
         Cov[k] = np.eye(self.f)
         
     self.Mu = Mu
     self.Cov = Cov
     self.pi = np.ones(self.K)/self.K

As following manner you can calculate your Cost and responsibility matrix for one iteration.

R = np.zeros((self.N, self.K))
for k in range(self.K):
    for n in range(self.N):
      R[n,k] = self.pi[k] * multivariate_normal.pdf(self.X[n],
                                           self.Mu[k], self.Cov[k])cost = np.log(R.sum(axis=1)).sum()
R = R / R.sum(axis=1, keepdims=True)
self.R = R

After calculating responsibility matrix and cost then it needs to be recalculate distribution paramters (Mean, Covariance and pi). It can be done as follows. then the one iteration has been completed and repeat the process until cost converge to given value.

for k in range(self.K):
     Nk = self.R[:,k].sum()
     self.pi[k] = Nk / self.N
     Mu_k = self.R[:,k].dot(self.X)/Nk
     self.Mu[k] = Mu_k     delta = self.X - Mu_k 
     Rdelta = np.expand_dims(self.R[:,k], -1) * delta
     self.Cov[k] = Rdelta.T.dot(delta) / 
                                  Nk + np.eye(self.f)*self.smoothing

Now lets see the results and visualization for several K values. I just used five K values.( here we know the actual K values equal to 4 because we just created our own data, but in real datasets you need test several K values and need to find the optimal one. )

You can see that when K= 4 you can obtain the most accurate model when comparing above visualizations. So this is the way you can build a Gaussian Mixture Model, but there are come constraints in this type of models. Let’s discuss what are those constraints.

The first constraint is K value. In more complex dataset you just need to try many K values beacuse we cannot catch the data pattern simply.

And the next one is Singular Covariance problem. Some times your datapoints can be get more and more close. then the conavriance matrix approaches zero and the inverse of the covariance approaches infinity.

Here is the basic definition of Gaussian Mixture Model and you can find the code in my github link under Gaussian Mixture Models.

Thank You !!!

Unsupervised Machine Learning with Gaussian Mixture Models

Let’s Build The Model

Written by Isuru Alagiyawanna