K-Means Clustering Algorithm

Ribhu Nirek
Analytics Vidhya
Published in
3 min readApr 10, 2020

--

Brief: K-means clustering is an unsupervised learning method. In this post, I introduce the idea of unsupervised learning and why it is useful. Then I talk about K-means clustering: mathematical formulation of the problem, python implementation from scratch and also using machine learning libraries.

Unsupervised Learning

Typically, machine learning models make prediction on data, learning previously unseen patterns to make important business decisions. When the data set consists of labels along with data points, it is known as supervised learning, with spam detection, speech recognition, handwriting recognition being some of its use cases. The learning methods where insights are drawn from data points without any ground truth or correct labels falls under the category of unsupervised learning.

Unsupervised learning is one of the basic techniques used in exploratory data analysis to make sense of the data before preparing to make complex machine learning models to make inferences. As this does not consist of human-labelled data, bias is minimized. Also, as there are no labels, there are no correct answers. From a probabilistic standpoint the contrast between supervised and unsupervised learning is the following: supervised learning infers the conditional probability distribution p(x|y), whereas unsupervised learning is concerned with the prior probability p(x).

K-Means Clustering Algorithm

Objective of clustering methods is to…

--

--