Customer Segmentation with Clustering Algorithms in Python

Muhammet Bektaş
6 min readMay 16, 2020

--

Unlike Supervised Learning, Unsupervised Learning has only independent variables and no corresponding target variable. Shortly, the data is unlabeled. The aim of unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.

We are going to examine a dataset that is about credit card users for segmentation. There is no any feature about label of customers. That is to say, we don’t have information about customer’s characteristics. We are going to try clustering clients with machine learning algorithms. Segmentation of customers has a pretty significant position for companies in new marketing diciplines. Firms must reach to the right target audiences with right approaches because of increasing costs.

First of all, we have started with data preprocessing such as filling the missing values ,standardization etc. Let’s start the application of the most used clustering algorithms.

1.K-Means Algorithm

K-Means is probably the most famous algorithm for clustering. To begin, we have drawn or plot a line according to inertia(sum of squared distances of samples to their closest cluster center) scores of number of cluster to select number of groups and also according to Silhouette and Davies Boulding scores.

for i in range(5,11):                kmeans_labels=KMeans(n_clusters=i,random_state=123).fit_predict(df_norm)     
print("Silhouette score for {} clusters k-means : {} ".format(i,metrics.silhouette_score(df_norm,kmeans_labels, metric='euclidean').round(3)))
print('Davies Bouldin Score:'+str(metrics.davies_bouldin_score(df_norm,kmeans_labels).round(3)))

After all, when we evaluate Elbow technique, Silhouette and Davies Bouldin scores. The optimal number of clusters is 7 according to K-Means Algorithm. So We have determined 7 as the k values of the K-means model. And now is the time to look at the scatter3d plot to see performance of the models. I think it is insufficient, furthermore…

2. MiniBatch K-Means

As you all know, the MiniBatch K-Means is faster than K-Means. However, sometimes it gives a slight different result and after n-cluster is determined 6 cluster according to the metrics above. You can see the scattered plot below.

fig = plt.figure(figsize=(12, 7), dpi=80, facecolor='w', edgecolor='k')
ax = plt.axes(projection="3d")
ax.scatter3D(pca.T[0],pca.T[1],pca.T[2],c=minikm_labels,cmap='Spectral')
xLabel = ax.set_xlabel('X')
yLabel = ax.set_ylabel('Y')
zLabel = ax.set_zlabel('Z')

3. Hierarchical Clustering

Hierarchical clustering is a clustering technique that aims to create a tree like clustering hierarchy within the data. On this model, to determine the n_clusters, we can able to use a dendogram.

The x-axis contains the samples and y-axis represents the distance between these samples. There are three clusters according to the dendogram. As you seen according to the scatter diagram below.

hcluster = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
hcp=hcluster.fit_predict(df_norm)
print('Silhouette Score for Hieararchial Clustering:'+str(metrics.silhouette_score(df_norm,hcp,metric='euclidean')))
print('Davies Bouldin Score:'+str(metrics.davies_bouldin_score(df_norm,hcp)))

4. DBSCAN

DBSCAN, as the name implies, is a density-based clustering algorithm. Density refers to the proximity of data points in a cluster and it is good for data which contains clusters of a similar density. Firstly, we should choose two parameters, a positive number epsilon and a natural number minPoints. Than we built the model.

results=pd.DataFrame(columns=['Eps','Min_Samples','Number of Cluster','Silhouette Score'])
for i in range(1,12):
for j in range(1,12):
dbscan_cluster = DBSCAN(eps=i*0.5, min_samples=j)
clusters=dbscan_cluster.fit_predict(df_norm)
if len(np.unique(clusters))>=2:
results=results.append({'Eps':i*0.5,'Min_Samples':j,'Numberof Cluster':len(np.unique(clusters)),'SilhouetteScore':metrics.silhouette_score(df_norm,clusters),'Davies Bouldin Score':metrics.davies_bouldin_score(df_norm,clusters)},ignore_index=True)

We have determined 5 cluster. It seems that DBSCAN is not an appropriate method for this dataset.

5. GMM Algorithm

Gaussian Mixture Models (GMMs) assume there are a number of Gaussian distributions, and each of them represents a cluster. Therefore a Gaussian Mixture Model tends to group together the data points that belong to a single distribution.

Firstly, we should determine the n_clusters. The optimal number of clusters is the value that minimizes the Akaike information criterion (AIC) or the Bayesian information criterion (BIC).

In the present case, when the number of clusters increases, the values ​​of AIC and BIC scores decreases. Which doesn’t give us a suitable solution.

Let’s look at the performance metrics according to changing parameters and also we are going to calculate the Davies Bouldin score.

parameters=['full','tied','diag','spherical']
n_clusters=np.arange(1,21)
results_=pd.DataFrame(columns=['Covariance Type','Number of
Cluster','Silhouette Score','Davies Bouldin Score'])
for i in parameters:
for j in n_clusters: gmm_cluster=GaussianMixture(n_components=j,covariance_type=i,random_state=123)
clusters=gmm_cluster.fit_predict(df_norm)
if len(np.unique(clusters))>=2:
results_=results_.append({"Covariance Type":i,'Number of Cluster':j,"Silhouette Score":metrics.silhouette_score(df_norm,clusters),'Davies Bouldin Score':metrics.davies_bouldin_score(df_norm,clusters)},ignore_index=True)

We have selected covariance type and number of clusters that are ‘spherical’ and 5 respectively. The parameters have given the Silhouette Score of 0.207.

6. MeanShift

MeanShift algorithm is another powerful clustering algorithm used in unsupervised learning. Unlike K-means clustering, it doesn’t require any assumptions; hence it is a non-parametric algorithm.

est_bandwidth = estimate_bandwidth(df_norm,quantile=.1,n_samples=10000)
mean_shift = MeanShift(bandwidth= est_bandwidth, bin_seeding=True).fit(df_norm)
labels_unique=np.unique(mean_shift.labels_)
n_clusters_=len(labels_unique)
print("Number of estimated clusters : %d" % n_clusters_)

Comparision of Results

Finally, We have tried six algorithm. K-Means has the best Silhouette and Davies Bouldin score. For this reason, K-Means Algorithm is more suitable for customer segmentation. Thus we have 7 customer types. Let’s try to understand behaviours or labels of customers.

We have choosen some columns that are significant to identify the clusters.

Cluster 0 : The highest purchase frequency which tend to pay in installment, that is higher credit limit and long duration customers.

Cluster 1 : Pretty low balance and purchase frequency. They rarely use credit card and also they have lower credit limit.

Cluster 2 : This group is having the highest amount of customers and lowest usage of cards. Inactive customers, also long duration customers.

Cluster 3 : High tendency of payment installment, higher purchase frequency and their tenure time is above avarage.

Cluster 4 : The highest balance amount but purchase frequency is not that good. Tend to cash in advance, higher credit limit than others. They don’t like spending money.

Cluster 5 : Second highest purchase frequency and also higher tendency payment in installment. They are long duration customers.

Cluster 6 : The least quantity of customer is in this group which are below avarage of purchase frequency and a shortly duration customers.

Summary

Firstly, we have started with data preprocessing. Than, we applied clustering algorithms. After comparing these clustering models than, we decided to use K-Means as the model. Than, we divided the data into seven clusters, because seven clusters can be easily used to determine the behaviours of customers. However, each of the clusters have their own characteristics.

Thank you for reading. You can visit my GitHub account.

https://github.com/muhammetbektas/Unsupervised-Learning/blob/master/Segmentation_of_Credit_Card_Users_in_Python.ipynb

--

--