Member-only story
Hands-on Tutorials
Clustering on numerical and categorical features.
Using Gower Distance in Python.
Introduction
During the last year, I have been working on projects related to Customer Experience (CX). In these projects, Machine Learning (ML) and data analysis techniques are carried out on customer data to improve the company’s knowledge of its customers. Recently, I have focused my efforts on finding different groups of customers that share certain characteristics to be able to perform specific actions on them.
As you may have already guessed, the project was carried out by performing clustering. For those unfamiliar with this concept, clustering is the task of dividing a set of objects or observations (e.g., customers) into different groups (called clusters) based on their features or properties (e.g., gender, age, purchasing trends). The division should be done in such a way that the observations are as similar as possible to each other within the same cluster. In addition, each cluster should be as far away from the others as possible. [1]
One of the main challenges was to find a way to perform clustering algorithms on data that had both categorical and numerical variables. In the real world (and especially in CX) a lot of information is stored in categorical…