Customer Segmentation with Fuzzy Segmentation: Using SQL and Python

Masoud Saedi
7 min readSep 15, 2023

--

Understanding your customer base is crucial in today’s highly competitive business environment. One way to achieve this is through customer segmentation — a process that involves dividing a company’s customers into groups based on different characteristics like buying habits, age, and interests. By doing this, businesses can create targeted marketing strategies that cater to each segment’s unique needs and preferences, ultimately resulting in higher customer satisfaction and loyalty.

Fuzzy segmentation takes this a step further. While traditional segmentation methods assign customers to distinct groups, fuzzy segmentation acknowledges that a customer can belong to multiple segments, but to varying degrees. This nuanced approach allows for more flexible and potentially more insightful analyses, bringing a more human perspective to the data.

This article will explore the interesting challenge faced by TravelTide — a fictional travel booking platform. Our primary objective is to assist TravelTide in creating a rewards program that resonates with various customer segments, thereby improving their loyalty and engagement with the platform.

Customer Segmentation divides a company’s customers into groups based on different characteristics.

What You’ll Find in This Article:

Let’s take a deep dive into the world of customer segmentation through the lens of fuzzy segmentation. With the help of SQL and Python, we’ll explore a hands-on approach to implementing this innovative technique in a real-world business scenario. This journey will enhance your skill set and equip you with the knowledge to gain a fresh perspective on customer data, potentially uncovering untapped opportunities. Once the analysis is complete, I’ll provide a sample executive summary report and presentation slides to share the key findings and recommendations with high-level stakeholders.

Context of the Customer Segmentation Analysis

TravelTide, a travel booking platform, is looking to enhance customer engagement by implementing a well-crafted rewards program. The marketing team has identified five potential perks to include in the program, which are as follows:

1. Free hotel meals
2. Free checked bags
3. No cancellation fees
4. Exclusive discounts
5. One night of free hotel stay with a flight booking

As we proceed with this project, let me show you the end goal through two mock-up examples of rewards program invitation emails which you will see in the following image. The email on the right presents a list of rewards with equal emphasis, expecting customers to look through and find what suits them. In contrast, the one on the left pinpoints a specific reward — the “No cancellation fees” perk, offering clear, targeted information that is likely to resonate more with a segment of customers who value this flexibility. Such targeted emails can significantly increase the signup rate for the rewards program.

Bearing this in mind, the objective of this analysis is to identify customer segments that are more likely to appreciate each of the proposed perks based on their past behaviour. This initial step will help in the design of a rewards program. However, it is important to note that the actual validation of the program will occur once the perks are rolled out, and customer response is observed.

A good, clear, targeted email (Left) and a bad, vague email (Right)

Step 1: Cohort Selection and Data Extraction Using SQL

Defining the Cohort

The first step involves selecting the right set of customers to analyze. Considering the time customers have spent interacting with the platform plays a critical role in ensuring the reliability of the segmentation analysis.

Thus, based on the guidance provided by the Marketing Team, we defined our cohort as users who have had a browsing session after the New Year holiday (starting from January 4, 2023) and engaged with the platform in more than seven sessions during this period. This approach ensures that we have a substantial behavioural data set to work with, excluding recent users with limited interactions.

Cohort definition

Understanding the Dataset

The dataset is stored in four primary tables: users, sessions, flights, and hotels. These tables contain a range of information including user demographic details and transaction history.

  • Users: This table encompasses user demographics including unique ID, gender, marital status, home location, and details regarding their children, among others.
  • Sessions: Details of individual browsing sessions are captured here, recording vital data points such as session duration, discounts offered, and booking information, to name a few.
  • Flights: Here, we find specifics of the purchased flights, including the departure and return details, airline information, and fare details.
  • Hotels: This section contains all the information about the hotel stays booked by the users, including check-in/check-out times and the cost of stay.

You can download the complete dataset through this link, if you are interested follow the steps along with me.

Feature Engineering: Extracting the Right Metrics

With the understanding of the dataset in place, the next crucial step is data extraction. Leveraging SQL’s capabilities, I crafted queries to extract a range of metrics, considering deeply which metrics would potentially be significant in analyzing the affinity for proposed perks.

In another article titled “Feature Engineering with SQL: Painting a Canvas with Data”, I take you on a detailed journey through the SQL query that stands as the backbone of our analysis. From understanding the Vincenty formula’s role in calculating distances to the thoughtful aggregation of various data points.

I have also documented all SQL queries with line-by-line comments in a GitHub gist, which you can access here.

Step 2: Exploratory Data Analysis with Python

In this phase, we will conduct an in-depth analysis of the extracted data using Python. The Jupyter Notebook, which can be found in this GitHub repository, provides detailed documentation of each step in the workflow, allowing others to easily identify areas that could be improved. Here’s a preview of what you can expect:

Introduction:

Setting the stage for the analysis with the necessary Python libraries and data loading.

Data Understanding and Handling Null Values:

A comprehensive understanding of the dataset, its features, and handling missing values judiciously to maintain data integrity.

Statistical Summary:

A statistical overview of the dataset, offering insights into the distribution and central tendencies of different features.

Outlier Treatment:

Identifying and treating outliers in the dataset to ensure more reliable and robust results in the subsequent analysis.

Scaling:

Scaling the data to bring all the features to a comparable range, setting a stable ground for further analysis.

Creating New Metrics:

Formulating new metrics essential for evaluating the potential perks, accompanied by a thoughtful exploration of correlations between different metrics.

Fuzzy Segmentation:

Utilizing fuzzy segmentation to outline customer segments according to potential interest in various perks. The process involves ranking the perk indexes for every customer, and then identifying the perk with the highest affinity for each individual based on the minimum rank principle.

Visualizations:

Illustrative visualizations bring to light the intricate patterns and relationships in the data, aiding in the intuitive understanding of customer behaviours and preferences.

By exploring this notebook, you can expect a well-rounded understanding of the analytical process, right from data cleaning to deriving meaningful insights through segmentation and visualization. Moreover, the notebook encourages collaborative learning by inviting others to find potential flaws and offer feedback, fostering a community of learning and growth.

The distribution of customers across different segments, as the outcome of the analysis.

Step 3: Crafting an Executive Summary and Presentation Slides

In the dynamic landscape of business analytics, the true power of a well-conducted analysis is realized through effective communication. It is the bridge that connects the analytical minds with the strategists, ensuring that the insights carved out from data are not just understood but are ready to be transformed into actionable strategies.

Audience and Purpose: The Blueprint of Communication

Understanding our audience and the core purpose of our communication is pivotal. It forms the blueprint that guides the structure, content, and tone of our communication. In our scenario, our primary audience is the Marketing Manager, a seasoned marketer overseeing the strategic narratives at TravelTide. But our communication canvas extends beyond, reaching to our technical colleagues as well. For them, the in-depth Jupyter Notebook serves as a rich resource, laying out the technical complexities of the project in detail, thus fostering a collaborative and knowledge-rich ecosystem. Breaking down the jargon and technical complexities, we aim to simplify the complex and make the simple compelling, offering resources tailored to varying levels of expertise.

Executive Summary and Presentation Slides templates

As we conclude our analytical journey, I would like to share with you the executive summary report and the presentation slides that I have created for this project. These resources are not just a reflection of our analytical efforts, but they also serve as an inspiration and a blueprint for other analysts.

Through these materials, you will witness the art of transforming data into compelling and easily understandable narratives. I tried to keep a balance between depth and simplicity, catering to both our primary and secondary audiences. The goal is to ensure that our insights not only capture attention but also retain it, leading to meaningful discussions and informed decisions.

Both the presentation and full report were crafted with the help of Canva. The executive summary report can be accessed through this link, and the presentation slides are available here.

Closing Note

As we reach the end of this walkthrough, I hope it has been helpful in shedding light on a practical approach to customer segmentation using fuzzy clustering, SQL, and Python. This is a continuous learning journey for me, and I believe in the collaborative spirit of the analytical community.

I look forward to hearing your thoughts, feedback, and ways this approach can be further refined. Feel free to share your insights, and let’s learn and grow together.

--

--