Getting to Know Your Customers: A Segmentation Analysis

9 min readSep 13, 2023

By Geetanjali Makineni, Evgeniya Dontsova, and Madhu Sowmya Bandi, Members of Scientific Staff, DISH Wireless

In today’s digital era, businesses are inundated with data. From online purchases to social media interactions, every customer touchpoint generates a wealth of information. However, this abundance of data can quickly become overwhelming, making it challenging for businesses to synthesize it all and effectively cater to their diverse customer base.

Customers play a pivotal role in driving business success and fostering a strong reputation. Their value extends beyond mere transactions, as they contribute to the overall goodwill and prosperity of a company. To ensure continuous progress and growth, it is crucial for businesses to gain a comprehensive understanding of their customers and their behaviors within the realm of products and/or services usage. This understanding serves as a foundation for effective customer categorization and segmentation techniques tailored to the specific needs and objectives of the business. By adopting a holistic approach to customer analysis, businesses can harness the full potential of customer data and leverage it to achieve overarching goals.

What is Customer Segmentation?

Customer segmentation is an invaluable tool that allows organizations to unlock the true potential of their data by grouping customers based on shared characteristics, behaviors, and preferences. Businesses can gain valuable insights that drive strategic decision-making, enhance customer experiences, and ultimately, achieve long-term success. By studying different customer groups, you learn what they value the most about your company. This information will help you create personalized products and services that perfectly fit your customers’ preferences. (See the summary here: Implementing Customer Segmentation Using Machine Learning). Segmentation also increases the likelihood that customers will engage with the brand, and reduces the potential for communications fatigue. (See, e.g. Customer Preference Profiling).

Invest in Customer Segmentation Analysis

In today’s highly competitive business landscape, providing exceptional customer experiences has become paramount for organizations across all industries. Customers expect personalized interactions and tailored solutions that address their specific needs. This is where the power of data comes in. By harnessing the vast amounts of data available, businesses can gain deep insights into their customers.

Customer segmentation empowers business to devise marketing campaigns and loyalty programs that resonate with specific customer groups, driving retention and profitability. Hence, leveraging the power of data in this way fosters customer loyalty, positive brand perception, and ultimately, drives business success in today’s customer-centric marketplace.

DISH Wireless took a new approach to group our mobile subscribers. We took a number of defining aspects, including call detail records, financial transactions, customer care interaction, and general attributes, to focus on understanding our customers in a 360-degree view.

The DISH Wireless Customer Segmentation Approach

Our customer segmentation is a fusion of various customer segmentation methods, taking into account a comprehensive 360-degree look at customer personas.

Data Domains

At DISH Wireless, we understand the importance of utilizing data-driven insights to create a seamless and personalized customer experience. As part of our customer segmentation process, we embarked on a journey to optimize our data management by categorizing our mobile subscriber data into three distinct domains: Call Detail Records (CDR), Customer Experience (CX), and Customer Finance (CF).

Service Usage-Based and Customer Movements (CDR): Customer segmentation based on call data usage records allows us to group customers by their communication habits, such as frequent callers, data-heavy users, or occasional users, enabling targeted marketing strategies and personalized service offerings. Monitoring their movements over time and space provides valuable insights into their geographic preferences, as well.

User Engagement With Support (CX): By analyzing customer experience data and their engagement with support services, we can identify and categorize customers based on their interaction frequency, satisfaction levels, and issue resolution preferences, helping us tailor support approaches and improve overall customer satisfaction.

Purchase Behavior (CF): Tracking customer finance data and monitoring their spending patterns allows to provide timely recommendations and avoid service interruptions, improving customer loyalty.

By dividing our mobile subscriber data into these three domains — CDR, CX, and CF — we have transformed the way we approach customer segmentation. The insights gathered from each domain now serve as a solid foundation for crafting personalized marketing strategies, targeted product recommendations, and optimized customer support. With this data-driven approach, we aim to not only exceed customer expectations, but also foster long-term relationships and sustainable business growth.

Exploratory Data Analysis (EDA) & Features

With data organized into the three domains, we began performing individual Exploratory Data Analysis (EDA) on each segment.

During the EDA phase, we meticulously examined the data within each domain. With CDR, we explored call logs, message records, and data usage patterns to unveil valuable insights of our customers’ communication behaviors. This helped us identify features like monthly median data used on international roaming, monthly median fraction of voice out, overall dropped calls fraction and more.

In the CX domain, we referenced the dataset collected from various customer touch points: website, mobile app, and customer chats with bot/agent. This comprehensive analysis enabled us to map out the customer journey, pinpoint areas of customer satisfaction, and uncover potential bottlenecks in the user experience. From there, we could identify frequent customer issues, average resolution time, and the number of customers contacting Care service.

Simultaneously, for the CF domain, we conducted an in-depth exploratory data analysis to understand our customers’ financial behaviors. Similar to other domains, we focused on various data sources to identify key features that could offer crucial financial information, like customer balances, funds, credits, and holds. Analyzing the intricacies of the various account balance buckets and their relationship with the billing ledger enabled us to learn about our customers’ spending patterns, credit utilization, and spending power.

Armed with new information from the CDR, CX, and CF domains, we built features. Our Feature Engineering Pipeline allowed us to transform raw data into actionable features, tailored to the unique demands of DISH’s diverse customer base.

Sampling Strategy

The analysis presented is based on a subscriber sample for each plan. To obtain sample data of unique subscribe IDs, we implemented an unbiased sampling strategy. Our primary objective was to ensure that the sample accurately represented the diversity of our extensive subscriber pool.

To compose a sample of 1,000 unique subscribers, we selected the top 350, bottom 350, and middle 300 which correspond to highly engaged, less engaged, and moderately engaged subscribers, respectively. Additional processing and consideration of CDR and CF data left us with 378 for prepaid and 594 postpaid subscribers. To ensure accurate representation and unbiased insights, we applied a distinct sampling method for customers within each plan. By incorporating this approach, we captured a balanced representation of subscriber engagement levels in different plans.

Clustering

Once we have the sample data of our subscribers, we took the necessary steps for data pre-processing to ensure accurate and meaningful results.

Data Cleaning: Our first step was to meticulously clean the data by removing any inconsistencies, missing values, and duplicates. This ensured our working data was free from errors that could skew our analysis.

Handling Categorical Features: Dealing with categorical features is essential in any data analysis. We applied the appropriate encoding techniques to transform categorical data into numerical representations that could be readily utilized by our machine learning algorithms.

To manage categorical features, we employed specific encoding techniques. For example, for the Mobile Operating System feature, we utilized Label Encoding. For the Frequent Support Intent Code feature, we applied Frequency Encoding. Specifically, we computed and normalized the intent codes value counts and subsequently assigned them to four categories using uniform binning.

Normalizing Data With Max ABS Scaler: To ensure fair representation of features across different scales, we employed the Max Absolute Scaler technique for data normalization. This process scaled our numerical data within a range of [-1, 1], preserving the data’s essential properties without introducing biases.

Addressing High Correlation: Identifying and addressing high correlation among features is crucial to avoid redundancy and prevent overfitting in models. By conducting correlation analysis, we carefully identified features with high correlation and made informed decisions on which ones to retain and which ones to exclude, optimizing the efficiency and performance of our customer segmentation models.

Dimensionality Reduction With PCA: As our dataset grew in complexity, we recognized the importance of addressing dimensionality issues to maintain model interpretability and reduce computational complexity. Principal Component Analysis (PCA) was used to reduce the dimensionality of our data while preserving its key features and minimizing information loss.

Upon conducting PCA, we analyzed the cumulative explained variance plot to assess the trade-off between dimensionality reduction and information preservation. After thoughtful consideration, we made an informed decision to retain seven principal components, ensuring an 80 percent percent preservation of the original data’s variability.

We made a conscious decision to eliminate irrelevant features from our dataset that did not contribute to a deeper understanding of customer behavior. By dropping these unnecessary features, we streamlined our data and focused solely on the most informative attributes.

After precise data pre-processing, using the steps described above, we refined our working data for customer segmentation algorithms. With a clean and optimized dataset at hand, we were well-prepared to extract actionable insights and deliver personalized experiences to each of our valued customers. The next step was to identify subscriber personas using data from all three domains. To accomplish this, we utilized the K-means unsupervised algorithm.

There are two common methods when determining the optimal number of clusters for K-means. One is the silhouette score and the other is the elbow method. The silhouette score measures how well each data point fits its assigned cluster compared to other clusters. Higher silhouette scores indicate better-defined clusters. Whereas, the elbow method helps find the optimal number of clusters in K-means by identifying the “elbow point” on the line plot, which represents the balance between improved clustering performance and increasing model complexity.

Figure 1: Determining optimal number of clusters. Plot on the left, using the elbow method, shows inertia versus number of clusters for mobile subscribers on prepaid plan. Plot on the right shows silhouette score versus number of clusters for postpaid plan subscribers. Chosen optimal number of clusters is highlighted by a red circle.

Based on plots shown in Fig.1, we determined the optimal number of clusters to be three for prepaid subscribers and four for postpaid subscribers, which are each highlighted by red circles. This clustering helped us group subscribers with similar characteristics for targeted strategies and improved customer experiences.

For hyper-parameter tuning in the K-means model, we focused on two crucial parameters:

random_state: This parameter ensures reproducibility by setting a specific random seed for the algorithm’s initialization. It allows us to obtain consistent results when running the K-means model multiple times.

n_clusters: This parameter defines the number of clusters we want the K-means algorithm to identify in the data. By selecting an appropriate value for n_clusters, we can create meaningful and distinct customer segments.

Figure 2: Clustering analysis with K-means algorithm shown using two principal components for mobile subscribers on prepaid (left) and postpaid (right) plans.

Fig.2 demonstrates the clustering of subscriber data from each plan in a 2-dimensional space. The clusters indicate groups with similar characteristics or patterns within the data. (See the tables below with detailed descriptions of each cluster characteristics.)

Customer Personas — Prepaid Plan

Customer Personas — Postpaid Plan

Summary

Customer segmentation analysis is a basic product that helps to build customer understanding and potentially design products and services to fulfill customer needs and improve overall satisfaction levels. Clustering analysis can also be used as a testing framework to monitor dynamics of customer behavior given some actions to identified segments. Current segmentation can be used as a baseline to define “ideal” subscriber clusters to develop and test different actions to approach desired cluster distribution, and even build a KPI for it.

About the Authors

Geetanjali Makineni is a Data Scientist at DISH Wireless and a member of the Research & Development team. She has three years of experience in IT and is enthusiastic about data science, machine Learning, Data-Driven Products, and cloud-based technologies. Her expertise lies in utilizing data to develop solutions and advancements within the telecom industry, leveraging her passion for cutting-edge technologies.

Madhu Sowmya Bandi is a Data Scientist at DISH Wireless with three years of experience in IT. She is passionate about data products, AI, and 5G network development and is constantly seeking opportunities to explore and innovate in these domains. The potential to harness data-driven technologies and create impactful solutions drives her enthusiasm.

Evgeniya Dontsova is a Staff Data Scientist at DISH Wireless. She is a part of the R&D team which focuses on solving emerging problems from the 5G network technology connectivity. Network optimization and network user experience assessment are her primary interests. Previously, she worked on solving optimization problems and applying machine learning modeling in the oil and gas industry, and academic research in the area of computational modeling of materials.