Member-only story
Opening the Black Box of Clustering — KMeans
First of a three-part series to Unsupervised Learning
This is the first of a three-part series to clustering, where I will cover some of the most popular clustering algorithms including K-Means, Agglomerative clustering and Gaussian Mixture Models. These are different clustering methods based on partition/distance, hierarchy and density respectively.
This article specifically covers K-Means clustering.
“The unsupervised learning is the way most people will learn in the future. You have this model of how the world works in your head and you’re refining it to predict what you think is going to happen in the future.” — Mark Zuckerberg
Unsupervised learning forms a very niche part of Machine Learning, simply because most tasks have a label on them (supervised). However, in cases where we lack these ‘labelled’ data, clustering methods can help us to find patterns by making inferences on the dataset. Common areas in which clustering is applied includes customer segmentation (for ad-targeting), population analysis (understanding demographics) and also anomaly detection.
Some people view unsupervised learning as a ‘grey area’ in Machine Learning because it can sometimes be hard to interpret the types of clusters that the algorithms output…