ELO MERCHANT LOYALTY RECOMMENDATION

8 min readJul 30, 2020

USING MACHINE LEARNING TECHNIQUES

Pushparaj BS, Praveen Perumal, A M Sharan Kumar, Sindhu Pushparaj, Dr. D Narayana

1. Introduction

1.1 ELO customer loyalty program

This problem revolves around Customer loyalty of credit cards for consumers in Elo, the largest payment brands in Brazil. Loyalty programs are rewards programs put in place by the owners of an establishment to encourage customers to return.

These programs are extremely popular in businesses where customers make frequent purchases. Every rewards program is unique to the company who implements it, but they operate on roughly the same premise (The more a customer shops at your establishment, the more rewards they receive, and the more incentive they must come back and continue shopping at your place of business.) Implementing a solid loyalty program is not only an important way for you as a business owner to make sure that your customers keep returning, it is also an important metric for your churn rate. It is important to do your research before adopting any loyalty solutions for your company, but this day in age, adopting some sort of loyalty program is becoming more and more advantageous.

2. Problem Statement

Elo have built tie-ups with respective merchants to offer their customers with discounts. Those can be any of the following :

* Cash Back Or Rebate Program : Customers earn money back from their purchases

* Discount Program: Offers a discount off the original price of any given item

* Frequency Programs:(Buy 5 subs and the 6th is on us!)

* Points Programs: Customers are given points, often in exchange for the amount they spend in your store.

* Tier System: Customers must make a simple purchase. The cost of those purchases will determine the level of rewards.

Hence, it is important to determine if the above loyalty programs do work or not, if the customer enjoys his/her card for purchases often and on which merchants. This insight helps to fine tune the offerings and maximize the profits.

Elo picked new merchants to recommend for each card holder. The date when Elo began providing recommendations is called the ‘reference date’. After the reference date, for each card Elo gathered transaction history for all new merchants that appeared on the card. By comparing each card’s new merchant activity and the list of the merchants recommended by Elo, the loyalty score was calculated. Our goal is to evaluate Elo’s recommendation algorithm by trying to predict in which cases it’s going to work well (yielding a high loyalty score) and in which cases — not (yielding a low loyalty score).

Submissions are scored on the root mean squared error. RMSE is defined as:

where Pi is the predicted loyalty score for each card_id, and Oi is the actual loyalty score assigned to a card_id.

3. Data Collection

The data sets are available in the Kaggle link of the ELO problem

The Datasets are fictional, and we have five .csv files:

train.csv (201917 rows, 6 columns)
test.csv (123623 rows, 5 columns)
historical_transactions.csv (29112361 rows, 14 columns)
new_merchant_transactions.csv (1963031 rows, 14 columns)
merchants.csv (334696 rows, 22 columns)

3.1 Train.csv/Test.csv:

3.2 Historical_transactions.csv/new_merchant_transactions.csv:

3.3 Merchants.csv:

4. Data Preparation

Since there are multiple data sets available, each of the data sets is analyzed separately for data inconsistencies.

Once the EDA is performed on each of the data sets and features are obtained that are consistent across all the data sets are identified.

5. Exploratory Data Analysis

5.1 General overview of training data

Record count:

Total number of observations: 201917
Number of features: 5

Where Loyalty Score on the x-axis represents the target value

We notice that target values(<-30) are far apart compared to others.

All the 2207 outlier records have the same target value: -33.22. For the time being, we retain these records in the data set

Next we analyzed the three anonymized feature columns. Based on the violin plot distribution of target value for these feature fields and bar chart drawn on the number of records, we infer that although the count of records across these features are different, but the distribution of target is almost the same across the three features.

5.2 Overview of transaction data (Historical transactions)

Total number of observations: 29,112,361
Number of features: 14
1,963,031

Analysis of Installment feature

Count of records against each installment category

Largest volume of transactions is with installment 0 and 1
We see unusual numbers like -1 and 999 as installments. They may be representing missing values
Only 3% of historical transactions are authorized with installment 999
Higher the number of installments, lower is the authorization rate

The Historical and new transactions datasets were concatenated as “All_transactions” dataset. We created a train transaction dataset from the all transaction only for the card ids found in the train dataset with the authorized flag set to “Yes”. Similarly, test transaction dataset created based on the card ids and authorized flag found in the test dataset.

Next, we performed missing value processing on the transaction train and test dataset and label encoded the categorical values in Category_1, Category_2 and Category_3 columns

We aggregated the historical and new transactions data grouped by card_id. Once the data is aggregated, the same in merged with the training and test dataset based on card_id.

6. Data Regularization: Lasso, Ridge and ElasticNet

Lasso Coefficients of training data

Using Lasso model, we have identified the features that influence the Target more compared to others.

Ridge Coefficients of training data

The following are the features and their coefficients listed based on Ridge regularization

ElasticNet Coefficients of training data

The following are the features and their coefficients listed based on ElasticNet regularization

7. Building the Loyalty Score model

7.1 Linear Regression Model

We used K Means clustering to find out the optimal no of clusters in the data. We standardize the data before applying clustering. We calculate the distortion of the clusters using euclidean distance and plot the average distortion using “Elbow Method”.

Using this, we see a fairly smooth curve, and it’s unclear what is the best value of k to choose, indicating the data is not very strongly clustered. If we need to cluster, then we need to consider the elbow at k=4 & 6.

7.6 AdaBoosting Regressor

7.7 Gradient Boosting Regressor

7.8 XG Boosting Regressor

We built a k-fold cross validation module with 5-folds and ran for the below algorithms

Linear Regression
ElasticNet, Lasso and Ridge
SVR
LGBM Regression
Gradient Boosting Regressor
XG Booster Regressor
Decision Tree Regressor
KNN Regressor
Random Forest Regressor
AdaBoost Regression
Bagging and Stacking Regressors

Below shown are the rmse scores against each of the algorithms from the cross validation model:

We see that the Light GBM model provided the lowest rmse value of 3.74.

We used the high-level programming language Keras for creating the model. Tensorflow acted as the backend for Keras. We were able to improve the rmse value substantially 3.04

8. Challenges

Data volume is very high

Mitigation — We used Kaggle kernel to run the code

Identifying the characteristics of some of the columns like Feature_1,2,3 or Category_1,2,3 as the data description provided by Elo was minimal
Finding the relationship between the various data sets provided and building a common training set for modelling

9. Conclusion

We made use of the various machine learning techniques learning during our course to solve a business problem. We analyzed both supervised as well as unsupervised learning models during the project. We will continue to improve the accuracy of the model by working on the various features identified so far and optimizing them for use in the model.