Handbook of Anomaly Detection: With Python Outlier Detection — (7) GMM

Chris Kuo/Dr. Dataman
Dataman in AI
Published in
14 min readOct 8, 2022

--

One of the best parts of a camping trip in my view is stargazing in the evening. Stargazing on a tranquil night brings a sense of mesmerizing childlike wonder. One time while I watching the stars, I was thinking about my data science lecture for the next day. Some stars in the sky cluster together and some are far apart. Stars do not line up artificially like a man-made grid. Many natural distributions are just like that. They cluster together or stand far from others.

Since we talked about clustering, a well-known clustering technique in statistics is K-means. Imagine the stars in the sky. K-means may identify several clusters and each star belongs to one and only one cluster. Gaussian Mixture Model (GMM) offers a fresh view. It assumes the stars may follow several different Gaussian distributions. What you see in the sky is the mixed result of the distributions. The advantage of GMM is its flexibility over K-means, and K-means is just a special case of GMM.

Today fewer people know who invented GMM. The invention goes way back to 1973 in the paper “Pattern Classification and Scene Analysis” by Duda and Hart. GMM is an unsupervised learning algorithm. Today GMM is used in the field of anomaly detection, signal processing, language identification, or even classifying audio clips. In this chapter, I…

--

--