Supervised Learning on Python — Predicting Customer Churn 3

Machine Learning & Business Strategy

Cheer Hung

Published in

Cheer and Utkarsh’s trial on Machine Learning

9 min readDec 17, 2019

This article is part of a series. Check out the full series:

Chapter 1 Data Preparation and Feature Visualization

Chapter 2 Deal with the skewness & Create dummy variables

Supervised Machine Learning

Before we start the model’s training , let’s talk about the normal Steps of Supervised Learning :

Split the data into training and testing
Define the model to be trained
Train the model on the training data
Use the trained model to predict values on the testing data
Measure model performance on testing data

In our training model processes, we add an inter-mediate step of tuning hyper-parameter between the process 3 and 4 to optimize for our selected metric to get a better model and to avoid over-fitting.

Split the data into training and testing

So, let’s start from the first step to split the data. We check the class split of our target variable:

We infer that there exists a class imbalance (73/27), while creating the split for our train and test data we have to ensure that both of these classes are present in both our train and test dataset. We should not use a random train-test split in this case, due to the class-imbalance in our data set (the random split might cause all the values of the minority class to fall in 1 group, which affects the model performance a lot). To overcome this issue, we use a Stratified Shuffle split which ensures both the classes of our target variable to be present in both train and test set in the same proportion.

# import the library
from sklearn.model_selection import StratifiedShuffleSplit# using stratified shuffle split from sklean to initiate the split
split = StratifiedShuffleSplit(n_splits = 1, test_size = 0.2, random_state=42)# writing a loop to stratify by category and store in train testfor train_index, test_index in split.split(telco1, telco1[“Churn”]):strat_train = telco1.loc[train_index]strat_test = telco1.loc[test_index]

Evaluation Metric

How are we going to evaluate the model’s performance and which model should we choose? By default, all the algorithms optimize for accuracy. However, we believe that accuracy can be misleading. Sometimes it may be desirable to select a model with a lower accuracy because it has a greater predictive power on the problem. Due to a high-class imbalance, a model can predict the value of the majority class for all predictions and achieve a high classification accuracy, the problem is that this model is not useful.

F1-score is an overall measure of a model’s accuracy that combines precision and recall. Accuracy can be high due to the presence of a large number of True Negatives (predicted not to churn and actually not churn) which in most business cases we do not focus on much, whereas False Negatives and False Positives usually has business costs associated with it. Thus, F1 Score might be a better measure to optimize for if we need to seek a balance between Precision and Recall when there is an uneven class distribution like in our dataset. The formula below is the harmonic mean of Precision and Recall and gives a better measure of the incorrectly classified cases than the accuracy metric.

In order to avoid making the code for grid search again and again we define a function which takes care of all our requirements and later we just call the function back and forth according to requirements.

Defining the estimator, param_grid & scorers to optimize for.estimator =param_grid ={ }scorers = { ‘precision_score’: make_scorer(precision_score), ‘recall_score’: make_scorer(recall_score), ‘accuracy_score’: make_scorer(accuracy_score), ‘f1_score’: make_scorer(f1_score) }grid_search_ = grid_search_wrapper(refit_score=’f1_score’)

The code above requires us to provide the following:

estimator : classifier which is to be used to train the model.
param_grid: hyper-parameters for the grid to search from to be defined according to the classifier.

Remember the intermediate step that we mentioned in the beginning?

The code below is the whole process:

# Defining a wrapper function to call and refit the best model.def grid_search_wrapper(refit_score):“””fit a GridSearchCV classifier using refit_score for optimizationprints classifier performance metrics and the confusion matrix“””# Creating a stratified cross-validationskf = StratifiedKFold(n_splits=5)# define the gridgrid_search = GridSearchCV(estimator, param_grid, scoring=scorers, refit=refit_score,cv=skf, return_train_score=True, n_jobs=4, verbose=10)grid_search.fit(x_train, y_train)# make the predictions on the test datasety_pred = grid_search.predict(x_test)# print the best parameters obtained from the grid searchprint(‘Best params for {}’.format(refit_score))print(grid_search.best_params_)# print the classification report on test datatarget_names = [‘class_0’, ‘class_1’]print(classification_report(y_test, y_pred, target_names=target_names))# confusion matrix on the test dataprint(‘\nConfusion matrix optimized for {} on the test data:’.format(refit_score))print(pd.DataFrame(confusion_matrix(y_test, y_pred),columns=[‘pred_class_0’, ‘pred_class_1’], index=[‘class_0’, ‘class_1’]))return grid_search

Finally, to execute the things mentioned above we need to call grid_search_wrapper(refit_score = “score to optimize for ()”)

Model Selection — SVM Model & XGBoost

Overall, we tried 9 different classifiers you can find the comprehensive code in our github. But here we are only going to talk about the final two models that we chose, which is SVM model and XGBoost.

Business Strategy

According to my experience working in CRM and having a tough time to be the project leader budgeting the P&L and tuning the customer journey. I believe that multinationals big enterprises and SMEs have the different marketing approach in terms of customer retention. We decide to give two solutions on different classifiers to two different business size.

SMEs — Lean into your best customers

For the SMEs or start-up, we will pick XGBoost model for them to identify a pool of customers that are on the brink of churning rather than any customers that have the possibility to leave the brand.

If I offer an incentive to customers most likely to churn, they may not leave the company, but will it be profitable for me?

In the result we can see that although compared to the rest of the models, the number of true positives identified are not the most. However, this model can be more precise compared to the rest of the models. In our test set there are 1409 customers, out of which 302 are predicted to be churn. In this 302 predicted-churn-customers, 67.2% (i.e. 203) of them will be correctly classified. This is the highest percentage of correctly classified to churn compared to the rest of the models.

We suggest this model for SMEs or start-up’s as we assume they have a lot of budget constraints and they would want a conservative approach for marketing.

We will recommend the enterprise itself to consider the likelihood that a customer will respond to your reengagement initiative — whether it is a phone call, email. In this case, we check the feature importance for the XGBoost model first, and we will get the result as below, we find that the month to month and paperless payment is the most important variable which impacts the churn in the XGboost.

Month to Month and Paperless Payment is the most important features.

We assume that the churn of this group of people are Reactive Churn, which means these customers don’t remember to update their credit card information, so the number was cancelled. We dig out that what’s the demography of this group of people looks like, we find out that the majority of this group is the young generation.

Churn-Distribution among different age groups who use Paperless billing and have a Monthly contract

What shall we do:

We believe that the better recurring billing solutions which enables the brand to notify this group of customer automatically when their charge was declined. So here are two steps that we will suggest to the enterprises:

Personalize Invitation
First step, we send the invitation to the youngster with personalized interactions, such as leverage user data like first name to customize interactions. Asking them to download the App and login the account. By this way, users feel like you’re actually speaking to them. If the SME’s have the brick and mortar stores, the simple POSM like the flier with QR code and the promote by the frontier will be a big help as well.
Leverage Push Notifications.
Base on the previous investigation, we find out that young generation (without dependents and partners) have a high potential to churn. Plus, the majority of youngsters have the internet service. For this sector, we can send automated push notifications to a user’s home screen to encourage repeat visits, engagement, and remind them to pay. With this cost-saving approach, these notifications can reactivate users who are at risk of churning.

Don’t let your best customers break up with you; never lose sight of the customers who have been there for you through it all. They’re the ones who you should be focusing on; it’s better for a new, recent customer to churn than a long-term one.

Big Company — Offer incentives

For the multinationals company we believe that SVM model is the best choice. Although compared to rest of the models, the percentage of correctly predicted churn is not the highest. However, this model can identify a relatively higher percentage of actual churn customers. In our test set there are 1409 customers, out of which 374 are actually going to churn. From this 374 churn-customers, 76.7%(i.e. 287) of them will be correctly classified. This is the highest percentage of correctly classified to churn.

For these multinational enterprises who have relatively more marketing budget, we will suggest them to provide incentives for those who is on the brink of churning. Don’t under estimate the power of the incentives. This small effort can go a long way when it comes to showing your existing customers how much you value their business. Especially when they are approaching the end of their contract and you are not sure if they will still stick with your brand. One tip that don’t reach your customer until the last week or month before their contract is over. Since we find out that most of the churn customers are reactive churn, so remind them several times with incentives and trigger them back to the brand. This is what you can do:

3 months before — Sending the information about the newest cellphone which just launches, telling them they can renew the contract to enjoy the latest mobile with the lower price.
1 months before — Sending the paycheck along with the reminder for renew the contract even they are paperless billing customers. Since customers with paperless billing are two times more likely to churn. And keep in mind that the paying process should be as simple as possible.
The last week — Providing a discounted renewal rate could be the push they would love to stick around.

Last but not least, Stay competitive.

Market conditions are constantly changing — and as new software and technologies enter the space, the needs and demands of your customers will inevitably shift. So ask yourself what’s next? Trends, technology, and product advancements — position themselves in a good spot in terms of disruption.

After using using the classifiers to identify the hidden variables and combinations of variables that predict customer churn. These are what we suggest for the next few miles

Break the customer base into scores of microsegments. The full value of data analytics can only be realized when companies can personalize the treatment of a precisely targeted group of customers with the highest propensity to leave. Such a tailored approach requires a granular micro-segmentation of the customer base which is then matched to a broad, well-classified library of offers.
Introduce agile test-and-learn processes. While data analytics can predict customer behavior, true value is only realized when operators are able to change that behavior. We have found that leaders in churn management are highly skilled at identifying — and quickly testing — new offers for individual microsegments. Doing so requires setting up a structured testing methodology and trying various offers for a given microsegment, such as different permutations of value, messaging, and mode of delivery.

Last, if you still have no clue that how shall it start for your plan, I always make note of our target competitors of their customer success initiatives to ensure that they aren’t lapping us.

Here are the previous chapters in case you missed it before :

Supervised Learning on Python — Predicting Customer Churn 1

Data Preparation and Feature Visualization

medium.com

Supervised Learning on Python — Predicting Customer Churn 2

Deal with the skewness & Create dummy variables