Photo by Aaron Burden on Unsplash

Hands-on Tutorial

The k-modes as Clustering Algorithm for Categorical Data Type

The explanation of the theory and its application in real problems

Geek Culture
Published in
10 min readJun 22, 2021

--

The basic theory of k-Modes

In the real world, the data might be having different data types, such as numerical and categorical data. To perform a certain analysis, for instance, clustering analysis, we should consider the data type in the data we have. The clustering algorithm commonly used in clustering techniques and efficiently used for large data is k-Means. But, it only works for the numerical data. It’s actually not suitable for the data that contains the categorical data type. So, Huang proposed an algorithm called k-Modes which is created in order to handle clustering algorithms with the categorical data type.

The modification of k-Modes as the improvement of k-Means for categorical variables can be found here.

The mathematics formula for K-Modes clustering algorithm
The mathematics formula for the k-Modes clustering algorithm (Image by Author)

The application of k-Modes

There are a few modules used for performing data preprocessing, data exploration with explanatory data analysis, and the k-Modes…

--

--

Audhi Aprilliant
Geek Culture

Data Scientist. Tech Writer. Statistics, Data Analytics, and Computer Science Enthusiast. Portfolio & social media links at http://audhiaprilliant.github.io/