# K-Means Clustering is a special version of Gaussian Mixture Models

K-means clustering is a simple algorithm for cluster data points in unsupervised learning. Well, it can be argued to be seen as semi-supervised learning because there exists a latent variable or a hyperparameter: n, the number of clusters. This is the same case in Gaussian Mixture Models.

The k-means algorithm, in less rigorous words, is:

- Initialize n random cluster centroids.
- Assign data points to the closest centroids based on the L2 distance.
- Compute the centroids by averaging out the data points at each centroids.
- Repeat until converge

What about GMM? The EM algorithm in GMM is complex, but all it is about is to use a complex trick to solve the problem of maximizing the probability likelihood of data points. Along the way of maximizing such expectation, we will find the corresponding parameters for the clusters; i.e. mean vector and covariance matrix. Particularly, we will repeat two steps: in each iteration i, find the expectation based on the parameters found in iteration i-1, and then maximize it. But how can GMM be related to k-means?

Note the second step there where we assign data points based on L2 distance. If we look into the probability density function of a normal distribution, and for simplicity, we assume it is one-dimensional normal distribution.

If we set sigma to be 1, then it becomes

Let X be a set of the data points and assume they are i.i.d normal. Also, we assume there exists n clusters with centroids mu_k, the likelihood of X is

To maximize this function, clearly we want to minimize this part

We can see that “Assign data points to the closest centroids based on the L2 distance” just minimizes this part and thus maximizes the likelihood function of X.

Note that we set the covariance matrix to be identity matrix in multi-dimensional case and 1 in one-dimensional case. This is the main difference between k-means and GMM: in k-means, we assume each cluster is a uniform sphere, while in GMM, each cluster can have arbitrary spherical shapes as long as they are spheres from normal distributions.

Side note: it is easy to prove that k-means is guaranteed to converge to a local minimum. However, GMM does not have to converge.