Introduction to Clustering

Dr. Roi Yehoshua
AI Made Simple
Published in
6 min readNov 24, 2023

--

Clustering is an unsupervised learning technique that groups a set of objects such that objects in the same group (cluster) are more similar to each other than to those in other groups.

Clustering helps in understanding the structure of the data and revealing hidden patterns. It can also be used in combination with supervised learning algorithms when labeled data is scarce or expensive to obtain, in what is known as semi-supervised learning.

Cluster analysis dates back to the 1960s, where it was used as part of numerical taxonomy in biology for classification of species. Since then, it has evolved and expanded across various fields, from marketing to social network analysis, becoming a fundamental tool in pattern recognition, data analysis and interpretation.

This article provides an overview of clustering, including its types, applications, main challenges, and common algorithms.

Clustering Definition

Formally, we are given a set of n data points: {x₁, …, x}, where each x is a m-dimensional vector, and we would like to partition the points into k sets (clusters) {C₁, …, Cₖ}, such that:

  1. Intra-cluster distances are minimized: data points within the same cluster are as close as possible to one another.
  2. Inter-cluster distances are maximized: data points in different clusters are as far as possible from one another.

--

--

Dr. Roi Yehoshua
AI Made Simple

Teaching Professor for Data Science and ML at Northeastern University | Top Writer in AI | 200K+ Views on Medium | https://www.linkedin.com/in/roi-yehoshua/