Mall Customers Segmentation — Using Machine Learning

Bringing Data Science, Machine Learning and Business together for-profit enhancement of malls or shopping complexes

Shubhankar Rawat
5 min readMay 20, 2019

The use of machine learning can be seen almost everywhere around us, be it Facebook recognizing you or your friends, or YouTube recommending you a video or two based on your history — Machine Learning is everywhere!
However, the ‘magic’ of machine learning is not just limited to only these areas.
Machine Learning is broadly categorized as Supervised and Unsupervised Learning.
Supervised Learning is one in which we teach the machine by providing both independent and dependent variables, for example, Classifying or predicting values.
Unsupervised Learning mainly deals with identifying the structure or pattern of the data. In this type of algorithms, we do not have labeled data(or the dependent variable is absent), for example, clustering data, recommendation systems, etc.
Unsupervised Learning provides amazing results as one can deduce many hidden relations between different attributes or features.

In this article, I will be discussing a specific problem based on clustering techniques(Unsupervised Learning). However, my main aim in this article is to discuss the opulent use of machine learning in business and profit enhancement.

The Problem

Malls or shopping complexes are often indulged in the race to increase their customers and hence making huge profits. To achieve this task machine learning is being applied by many stores already.
It is amazing to realize the fact that how machine learning can aid in such ambitions. The shopping complexes make use of their customers’ data and develop ML models to target the right ones. This not only increases sales but also makes the complexes efficient.

Enough talk, let’s get in the action.

You can find the code in my GitHub repository here

The Data

Dataset of the mall customers

Here we have the following features :
1. CustomerID: It is the unique ID given to a customer
2. Gender: Gender of the customer
3. Age: The age of the customer
4. Annual Income(k$): It is the annual income of the customer
5. Spending Score: It is the score(out of 100) given to a customer by the mall authorities, based on the money spent and the behavior of the customer.

Data Preprocessing
Checking the null values :

We have zero null values in any column. Sounds Good!

We see that we have only one categorical feature: Gender, we will one hot encode this feature.
Data after one-hot encoding :

Now the data preprocessing has been done and now let us move on to making the clustering model.

I will use the K-Means Clustering algorithm to cluster the data.
To implement K-Means clustering, we need to look at the Elbow Method.

The Elbow method is a method of interpretation and validation of consistency within-cluster analysis designed to help to find the appropriate number of clusters in a dataset.
The following figure demonstrates the elbow method :

It is clear from the figure that we should take the number of clusters equal to 5, as the slope of the curve is not steep enough after it.

Finally, let us plot the clusters :

The data(clusters) are plotted on a spending score Vs annual income curve.
Let us now analyze the results of the model.

Analyzing the Results
We can see that the mall customers can be broadly grouped into 5 groups based on their purchases made in the mall.

In cluster 4(yellow colored) we can see people have low annual income and low spending scores, this is quite reasonable as people having low salaries prefer to buy less, in fact, these are the wise people who know how to spend and save money. The shops/mall will be least interested in people belonging to this cluster.

In cluster 2(blue colored) we can see that people have low income but higher spending scores, these are those people who for some reason love to buy products more often even though they have a low income. Maybe it’s because these people are more than satisfied with the mall services. The shops/malls might not target these people that effectively but still will not lose them.

In cluster 5(pink colored) we see that people have average income and an average spending score, these people again will not be the prime targets of the shops or mall, but again they will be considered and other data analysis techniques may be used to increase their spending score.

In cluster 1(red-colored) we see that people have high income and high spending scores, this is the ideal case for the mall or shops as these people are the prime sources of profit. These people might be the regular customers of the mall and are convinced by the mall’s facilities.

In cluster 3(green colored) we see that people have high income but low spending scores, this is interesting. Maybe these are the people who are unsatisfied or unhappy by the mall’s services. These can be the prime targets of the mall, as they have the potential to spend money. So, the mall authorities will try to add new facilities so that they can attract these people and can meet their needs.

Finally, based on our machine learning technique we may deduce that to increase the profits of the mall, the mall authorities should target people belonging to cluster 3 and cluster 5 and should also maintain its standards to keep the people belonging to cluster 1 and cluster 2 happy and satisfied.

To conclude, I would like to say that it is amazing to see how machine learning can be used in businesses to enhance profit.

--

--

Shubhankar Rawat

I am a data science and machine learning enthusiast, who loves to share knowledge.