Elo Merchant Category Recommendation-Understand Customer Loyalty

Praveen Jalaja
CodeX
Published in
11 min readFeb 13, 2021

Author: Praveen Jalaja

credit: Marcelo Souza

To understand the better about machine learning world, building a model will help much more than reading and binging videos online. Hands-on is much required to scale the understanding of Machine learning concepts and its proper applications. Likewise, This is one of the hands-on case study done do get the better understanding of ML world. Since, The case Study discussed here is a Kaggle competition Elo Merchant Category Recommendation.

Introduction

Elo Merchant Category Recommendation is a Kaggle competition which is provided by Elo. It is one of the largest payment brands in Brazil which provides the debit and credit cards to the customers. As a payment Brand, providing offer promotions and discounts with merchants is a good marketing strategies. The cardholder can use promotional discounts with various merchants. Elo as payment brands wants to personalize the cardholder’s experience with the promotions. By providing the promotions to the user on different merchants based on their payment behavior can increase enrich user experience and attract the costumers.

Two years ago, The Kaggle Competition was put up by Elo for the competitors to build a Solution to personalize the promotions and discounts. Elo already build ML models to understand other aspects like the customer’s food and Shopping. This Competition is for the specifically building model which can be used to uncover customer loyalty towards the brand to personalize the discounts and promotional campaign.

Overview:

The Case Study on the Elo category Recommendation is done by the adhering to the following Step Process:

  1. Defining Business Problem
  2. Machine Learning Problem Formulation
  3. Exploratory Data Analysis
  4. Data Cleaning
  5. Feature Engineering
  6. Exploration of Engineered Features
  7. Model Building
  8. Conclusion.

Defining Business Problem

Elo as a payment brand needs to keep their customers to use the cards for payments across different provided by them. So, In other words, Loyalty of the customers towards the brand is the key aspect here. Elo needs to keep their customers who have high loyalty towards the brands. Which can be achieved by the different promotional campaign’s targets towards them. There can be millions of Elo’s payment cardholding customers, but the campaign’s has to be targeted towards them based on their Loyalty towards the brand Elo.

For Example, an customer using the Elo card with diverse merchants and for a long time, the user’s loyalty is high. So, therefore to keep the customer as a subscriber, Elo can run a discount campaign with the customer’s favorite or frequently used merchants. In Large Scale, these loyal customers get their discount and stay with their payments in their future purchases. This kind of personalization brings forth many new customers and keeps the existing customers.

The Problem is to find a metric which has to reflect the cardholder’s loyalty with Elo payment brand. So, the Elo uses it for their business decision about their promotional campaign. And, thereby Elo can reduce the unwanted campaign and focuses on the area’s where it is required.

Machine Learning Problem Formulation

In terms of Machine Learning, we need a metric to measure up the customers loyalty. The Loyalty Score is a measure which is given by the Elo brand to measure up the customer Loyalty towards Elo brand. These Loyalty scores depend on many aspects of the customers. The purchase history, usage time, merchant’s diversity, etc. The Loyalty Score can be predicted from the information of the customers purchases and usage. The Loyalty Score Prediction from customer’s purchase data is the crux of our Problem. The Loyalty Score is the target variable for which the Machine Learning Model should be built to predict.

Target Variable — Loyalty Score

Input Features — Cardholder’s Purchase history, usage time etc.

The Constraint is that the data which has been provided is not real-customer data. The Provided data is fictitious and simulated data due to the privacy and legal constraints of the customers. Simulated data sometimes has an artificially induced bias which will affect the prediction model performance. To identify the bias and account for it in the final Model is also a part as the solution to the Problem.

Exploratory Data Analysis

train.csv — This is the csv file containing the basic info of customer’s subscription to Elo card. There are Six variables in the train.csv. The target variable (Loyalty Score) is given for training the model is given in this file.

test.csv — This is the test data for which we have to test our final model. It has all the same columns as the train.csv except for the target variable.

train.csv

Train Data have target value, the simple PDF of it reveals it has outliers around -30 and standardized with mean 0.

The distribution of the target with respect to the categories of anonymized features(Feature_1,Feature_2,Feature_3) doesn’t differ from each other. This shows the target values are not skewed for different categories in these three anonymized features. we need to dig deeper to establish relationship card_id and target value. And, the target variable (loyalty score) behaves like a damping frequency plot with respect first_active_month. The card’s active around 2012–2015 have large peaks and valleys, but after 2015, the target becomes linear and on a rise. This can be due to type transactions made with cards differ between the two time periods.

Since, Distribution of both the train and test are almost identical. So there is no time based splitting in the make over of the data. And, it assures for prediction of the test data.

The VIF values for all the three features are well under 10. So there is no problem of multicollinearity in the train data.

historical_transactions.csv — This csv file contains 14 different variables about their transactions for each customer.

New_merchants_transactions.csv — it has the same set of column variables as historical_transactions.csv but recorded on a different time frame. This contains information about transactions during the period before the loyalty score was calculated. This is the recent information on transactions of customers.

historical_transactions.csv

The historical and new merchants transactions data features like category_1,category_2,category_3,installments and month lag have almost no difference in distribution with respect the target value. And, doesn’t reveal any details about target variable relationship with the cards.

One important trait about new merchants transactions data, there is no unauthorized transactions in it all the transactions have authorized Flag as True. And, the other important info from transactions data is that the purchase amount is normalized and added with constant noise. Since, it manipulated data identification of the manipulation technique can help with modeling. And historical transactions have less number of outliers than new merchants transactions on purchase amount, This again proves the types of transactions is different in some way.

And, the Purchase date feature can reveal the inherent property of the transactions and the transactions are time dependent, the engineered features from it will be useful in prediction.

The above two plots have shows the trend of the count of purchases along the week and hour frame. We can clearly see pattern here, the transactions on weekday is increase throughout the week and drops on Sunday. And, afternoon to night hours have have higher number of transactions. And, one important information from the hour of the transactions, there is huge spike in zeroth hour of the day. And, most of subscription with online retailers transactions happens at the zeroth hour of the day. This means the transactions have not only direct card payments but also has subscription kind of online transactions.

There are many missing values in the merchants, historical and new merchants transactions, so these missing values must be imputed in a intuitive/effective way for better prediction.

The value for the authorized flag is somewhat higher, it is around 32 which indicates possible correlation. So this variable needs further investigation. Other than the authorized flag the remaining variables doesn’t look correlated.

merchants.csv

From the Exploration of the Merchants data it has more data about the merchants which doesn’t seems to help in the prediction of the target value and it increases the complexity of the problem, So In this study merchants data is not used in any way to build the models.

At the End of the Exploration of the transactions, merchants and train data, the given features of transactions are not big factor for the calculation of the target Score. There exist an aggregated or engineered feature or features which can be helpful in predicting the target Score. With the different feature engineering techniques and market research techniques we have to produce the new features which may or may not be very useful in the prediction model. By implementing the major feature engineering ideas we have to produce features and build model upon it.

Data Cleaning

Before Jump into the Feature Engineering, One important finding from EDA is null values present in the data files. The null values has to be imputed before the feature engineering. There are many way impute null values. Here, the imputation null values are done by models build on non-null values of features and other features to predict the null-values.

Feature Engineering

One Hot Encoding-The categorical features (category_3, category_2, month lag) in the transaction data are one-hot encoded before engineering features from these features.

De-anonymizing purchase amount-as we discussed in the EDA, the purchase amount features are anonymized with mean centering and scaling. With the help of raddar insights on kernel , the purchase amount de-anonymized by dividing 0.00150265118 and adding 497.06.

Aggregation-In the transactions data, there are numerical and categorical features in the data. The basic statistical numbers like no of unique values ,mean, max, min, sum, standard deviation and skew are calculated for the transactions data features to make single value for each card id’s on each features.

Date Time Features-purchase date feature records the timestamp of the transactions. In addition to basic statistical features of hour, day, week, month and year, other features like

  1. the day of transaction is weekend or not.
  2. average time between purchases.
  3. holiday purchase or not.

Then, these are features are aggregated with the required stats to feed into the model.

Other than the above mentioned major features, some ratio’s of the engineered features are calculated like count ratio between number of transactions and number of date difference etc.

The train and test data is merged with these engineered features of transactions. And, the basic time features are derived from the first active month feature also.

RFM analysis is a market research methodology which uses Three measures of the transactions of customers to categorize them.

  • R — Regency — Days since last transactions
  • F — Frequency — Frequency of the transactions
  • M — Monetary value — Total money spent by the customer/card holder.

RFM index and RFM score is calculated to understand the customer loyalty. So, Here I used quantiles to categorize the RFM values and by simple addition RFM score and by concatenation the RFM Index is derived as features.

At the End, I derived 280 numerical features from the transactions, train and test data. These 280 features are feed into the model for training and prediction.

Exploration of Engineered Features

The Final set Engineered Features, have lot of features have nan values in it. Out 280, 158 features have Nan Values in it. More than half of the features have to imputed to the nan values. Since Imputation by building model takes lots memory and time. A simple Imputation technique is suffice. In this Study I experimented with mode imputation and imputation with zero. From the results of it Imputing the nan values with zero yields better prediction results.

Model Building

Evaluation metric-Root Mean Square Error. In competitions on platforms like Kaggle the metric for evaluate the model prediction are given the competition provider itself.

From the research done with the already available solutions and kernels, the simple model like linear regression, KNN doesn’t work better for the prediction of the target value. So, I jumped into the complicated model like GBDT .

LightGBM- Gradient boost model is the first step towards solution. LightGBM model without hyperparameter tuning results in poor RMSE value 3.744407. But with the hyperparameter tuning by using the optuna package gives better RMSE value of 3.61634. Optuna Package is great hyperparameter optimization framework which provides great deal of control over the optimization than the sklearn grid and random search method.

XGBoost- XGBRegressor Model with tuned hyperparameters doesn’t better RMSE Score than the lightGBM Model but faster in training with use of GPU.

Stacking - Simple Blending of the XGBoost and lightGBM model predictions improved the RMSE Score than XGBoost Model predictions. Then, the two model predictions are Stacked using Ridge meta-learner. This stacked model with meta-learner gave better RMSE Score on Kaggle than all other models predictions.

Conclusion

The Stacked model build on the XGBoost and lighGBM models with a ridge meta learner gives a better Kaggle score of 3.61596 than other models. This Study reveals the true power of feature Engineering aspect of the Machine Learning Model builds. Almost all the features in the train data feed into the models are engineered data.

Future Work

  1. Inclusion of the engineered features from the merchants data can help improve the model predictions.
  2. Feature Selection can improve the model’s performance by reducing the bias added by the unwanted features.

--

--