TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial…

Member-only story

Hands-on Tutorials

Clustering on numerical and categorical features.

Using Gower Distance in Python.

Jorge Martín Lasaosa
TDS Archive
Published in
10 min readMay 29, 2021

--

Photo by Munro Studio on Unsplash

Introduction

During the last year, I have been working on projects related to Customer Experience (CX). In these projects, Machine Learning (ML) and data analysis techniques are carried out on customer data to improve the company’s knowledge of its customers. Recently, I have focused my efforts on finding different groups of customers that share certain characteristics to be able to perform specific actions on them.

As you may have already guessed, the project was carried out by performing clustering. For those unfamiliar with this concept, clustering is the task of dividing a set of objects or observations (e.g., customers) into different groups (called clusters) based on their features or properties (e.g., gender, age, purchasing trends). The division should be done in such a way that the observations are as similar as possible to each other within the same cluster. In addition, each cluster should be as far away from the others as possible. [1]

One of the main challenges was to find a way to perform clustering algorithms on data that had both categorical and numerical variables. In the real world (and especially in CX) a lot of information is stored in categorical…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Responses (13)