K-Means Clustering

In-depth explanation of the popular k-means algorithm including implementation in Python from scratch

Dr. Roi Yehoshua
AI Made Simple

--

Clustering is the task of grouping a set of objects, such that objects in the same group (cluster) are more similar to each other than to those in other groups (clusters).

K-means is a centroid-based clustering technique that partitions the dataset into k distinct clusters, where each data point belongs to the cluster with the nearest center. It is one of the simplest and most efficient methods for clustering. K-means has been successfully employed in numerous applications across diverse areas such as customer segmentation, image compression, document clustering, and anomaly detection.

In this article we will explore the k-means algorithm, implement it from scratch in Python, and discuss its various variants and limitations.

Centroid-Based Clustering

In centroid-based clustering, each cluster is represented by a central vector, called the cluster center or centroid, which is not necessarily a member of the dataset. The cluster centroid is typically defined as the mean of the points that belong to that cluster.

Our goal in centroid-based clustering is to divide the data points into k clusters in such a way that the points are as close as possible to the centroids of the clusters they belong to.

--

--

Dr. Roi Yehoshua
AI Made Simple

Teaching Professor for Data Science and ML at Northeastern University | Top Writer in AI | 200K+ Views on Medium | https://www.linkedin.com/in/roi-yehoshua/