Analytics Vidhya
Published in

Analytics Vidhya

What is Clustering?

Introduction: Ever thought of arranging data based on similar features without having the actual labels/classes/targets for the data. This article will provide you with complete knowledge of this thing with all the possible ways and their practical examples.

Source: heyerlein via unsplash

Unsupervised Machine Learning

These are the category of algorithms in which we have only features for data i.e. labels/classes/targets are not available in the data.

We can also say that it is not known that for a particular record of the data, where it should belong. This is the main significance of this category of machine learning algorithms.

Clustering Explanation

As it is clear till now that it falls under unsupervised machine learning algorithms, so obviously we are not having the target classes for our data.

These algorithms work with the goal of making a few groups/clusters of the data by the similarity between them or finding some patterns between the data.

In a nutshell, clustering will make several clusters, & data present in each cluster will be having the utmost similarity, & the data present between different clusters will be having the least similarity.

From the above explanation, it can be concluded that the ultimate goal of the clustering is to minimize the Intra-cluster distance & maximize the inter-cluster distance.

Types of Clustering

  1. Partition based Clustering
  2. Hierarchical Clustering
  3. Density-based Clustering

Partition based Clustering

  • These types of clustering algorithms generate Sphere like clusters.
  • They are relatively efficient.
  • Used for Medium or Large size Databases.
  • Examples: K-Means, Fuzzy C-Means, K-Median.

Hierarchical Clustering

  • These are the algorithms that generate trees of clusters and group similar data.
  • Very Intuitive Algorithms.
  • Generally good to use with small-sized datasets.
  • Example: Agglomerative, Divisive.

Density-based Clustering

  • They produces clusters with arbitrary shape.
  • They are excellent to use when there is no noise in the dataset.
  • Example: DBScan Algorithm.

Use Cases of Clustering

In Retail/Marketing:

  • Identifying buying patterns of customers.
  • Recommending new movies to customers.
  • Recommending new gadgets to customers, etc.

Banking:

  • Identifying a set of customers. (Eg, loyal, churn, etc.)
  • Fraud Detection, etc.

Insurance:

  • Fraud detection in claim analysis etc.

Publication:

  • Automatic categorizing of the news based on the content of the news.
  • Recommending similar news articles.
  • Identifying a set of readers. (Eg, loyal, churn, etc.)

Medicine:

  • Characterising Patient Behaviour for the effect of medicine.
  • Identifying similar drugs by clustering them.

Biology:

  • Clustering genetic markers to identify family ties/ family generation.
  • Identifying a particular species.

Clustering VS Classification

The most significant difference between them is that, in classification, for each record we have corresponding label, but in clustering, we do not have labels at all.

I hope my article explains each and everything related to clustering along with its use cases. Thank you so much for investing your time in my articles and boosting your knowledge!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Harshit Dawar

Big Data Enthusiast, have a demonstrated history of delivering large and complex projects. Interested in working in the field of AI and Data Science.