Expectation Maximization
Expectation Maximization is an unsupervised learning technique.
EM framework can be used to train latent variable models, wherein variables of interest are not directly observable. In this discussion, we examine how EM can be applied to a Gaussian Mixture Model.
Say, the observed data is X, If we make a modeling assumption that the data was derived from a mixture of K Gaussian Distributions, EM provides an iterative procedure to estimate the parameters of the underlying K distributions, namely,
a) Weightage of the distribution in the mixture model
b) mean of the distribution
c) covariance of the distribution
As is usual in many ML tasks, we can set up a log likelihood function here as well. However, this does not yield any closed form solution. Instead, we have an iterative scheme that one can show maximizes the log likelihood. This algorithm is the Expectation Maximization (EM) algorithm. This is a very fundamental and powerful algorithm in probabilistic ML, much like gradient descent.
The expectation maximization (EM) algorithm, has predictably two steps: an expectation step and a maximization step. You will find similarities between them and the k-means algorithm.
Expectation Step
- We will first calculate the probabilities 𝜋1π1 and 𝜋2π2.
We totaled responsibilities assigned to each class. And their fraction get assigned as the probabilities of the mixture.
Maximization step
2. Now to find the new “centroids” (as in k-means)
This is clearly the weighted average, weighted by responsibilities.
3. Now to the variances should be obvious. It follows from the definition of variance, except for the responsibilities acting as proxies to probabilities
4. We now repeat these two steps until convergence.