Telecom Churn Prediction

Published in

Analytics Vidhya

11 min readApr 6, 2020

Customers are the most important resources for any companies or businesses. What if these customers leave the company due to high charges, better competitor offers, poor customer services or something unknown? Hence, Customer churn rate is one of the important metrics for companies to evaluate their performance.

Customer churn rate is a KPI to understand the leaving customers. Churn rate represents the percentage of customers that company lost over all the customers at the beginning of the interval.

For example, If company had 400 customers at the beginning of the month and only 360 at the end of the month, means company’s churn rate is 10%, because company lost 10% of the customer base. Companies always try to minimize the churn rate to as close as 0%.

1) Introduction

Dataset, features and target value
Problem description

2) Descriptive analysis and EDA (Exploratory Data Analysis)

Churn rate and Correlation between features
Profile of Churn vs Existing customers
Tenure and Monthly charges distribution

3) Cluster analysis

Churn cluster detection
Churn customer cluster analysis — by Demographic, Account type and Service Usage

4) Churn customer prediction model

Prediction model process
Model evaluation

5) Retention plan

Source code — Notebook

Link — Github

1. Introduction

Dataset, Features and Target value

Source : https://www.kaggle.com/blastchar/telco-customer-churn ( IBM Sample dataset)

Here, IBM provided customer data for Telecom industry to predict churn customer based on demographic, usage and account based information. Main objective here is to analyze churn customers’ behavior and develop strategies to increase customer retention.

Assumption — Here, data source has not provided any information related to time; So I have assumed that all the records are specific to the particular month.

Dataset has information related to,

Demographic:

Gender — Male / Female
Age range — In terms of Partner, Dependent and Senior Citizen

Services:

Phone service — If customer has Phone service, then services related to Phone like Multi-line Phone service
Internet Service — If customer has Internet service, then services related to Internet like Online security, Online backup, Device protection, Tech support, Streaming TV, Streaming Movies

Account type:

Tenure — How long customer is with the company?
Contract type — What kind of contract they have with a company? Like Monthly bases, On going bases — If on going bases, then One month contract or Two year contract
Paperless billing — Customer is paperless billion option or not?
Payment method — What kind of payment method customer has? Mailed check, Electronic check, Credit card (Automatic), Bank transfer (Automatic)

Usage:

Monthly charges
Total charges

Target:

Churn — Whether customer left the company or still with the company?

Problem Description

Why customers are leaving the company?

The reasons behind the customer leaving company could be

High charges
Better offer from competitor
Poor customer service
Some unknown reasons

How to detect the churn customers?

Monitoring usage
Analyzing complains
Analyzing competitors offers

How to prevent customers from leaving a company?

Once you detect high risk customers, apply

Retention plans
Improve customer service

2. Descriptive analysis and EDA (Exploratory Data Analysis)

Calculate Churn Rate

Churn rate = # of Churn customers / # of total customers

Churn_rate = df_cal['Churn'].value_counts() / df_cal.shape[0]
Generate_bar_graph(Churn_rate.index.map({0:'Existing',1:"Churn"})
                   , Churn_rate.values
                   , 'Customers'
                   , 'Percentage'
                   , 'Customer Distribution')print(Churn_rate)

Analysis shows that Churn rate of the Telecom company is around 26%.

Correlation between features

def Generate_heatmap_graph(corr, chart_title, mask_uppertri=False ):
    """ Based on features , generate correlation matrix """
    mask = np.zeros_like(corr)
    mask[np.triu_indices_from(mask)] = mask_uppertri    fig,ax = plt.subplots(figsize=(12,12))
    sns.heatmap(corr
                , mask = mask
                , square = True
                , annot = True
                , annot_kws={'size': 10.5, 'weight' : 'bold'}
                , cmap=plt.get_cmap("YlOrBr")
                , linewidths=.1)
    plt.title(chart_title, fontsize=14)
    plt.show()var_corr = round(df_cal.corr(),2)
Generate_heatmap_graph(var_corr
                       ,chart_title = 'Correlation Heatmap'
                       ,mask_uppertri = True)

From correlation matrix, features like Tenure, Monthly charges and Total charges are highly correlated with services like Multiple Phone Lines services and Internet services like Online Security, Online Backup, Device Protection, Tech Support, Streaming TV and Streaming Movies services.

Distribution of Categorical and Binary variables by target (Churn vs not churn)

def Create_data_label(ax):
    """ Display data label for given axis """
    for bar in ax.patches:
            ax.text(bar.get_x() + bar.get_width()/ 2
                    , bar.get_height() + 0.01
                    , str(round(100 * bar.get_height(),2)) + '%'
                    , ha = 'center'
                    , fontsize = 13)
            
            
def Categorical_var_churn_dist(data, cols, distribution_col):
    """ Distribution of categorical variable based on target variable """
    
    for i,feature in enumerate(cols):
        
        feature_summary = data[feature].value_counts(normalize=True).reset_index(name='Percentage')
        
        plt_cat = sns.catplot(x=feature
                , y='Percentage'
                , data = feature_summary
                , col=distribution_col
                , kind='bar'
                , aspect = 0.8
                , palette = plotColor
                , alpha = 0.6)
        
        if feature == 'PaymentMethod':
            plt_cat.set_xticklabels(rotation= 65, horizontalalignment = 'right')
        
        
        for ax1, ax2 in plt_cat.axes:
            Create_data_label(ax1)
            Create_data_label(ax2)
        
        
        plt.ylim(top=1)
        plt.subplots_adjust(top = 0.9)
        plt.gcf().suptitle(feature+" distribution",fontsize=14)
    plt.show()churn_summary = df_cal.groupby('Churn')
Categorical_var_churn_dist(churn_summary, cat_cols,'Churn')

Profile of Churn vs Existing customers based on above analysis

Churn customers are likely to

not have Partner and Dependents; Meaning likely to be single.
have Internet service and specifically Fiber optics
not have online security service, online backup service, device protection service, Tech support service
have streaming TV and streaming Movies services
be with monthly based plan
have paperless billing service
have electronic check payment method

Distribution of Tenure, Monthly Charges and Total Charges

# Mean summary of customers (Churn vs Non churn)
print(churn_summary['Tenure','MonthlyCharges','TotalCharges'].mean())Tenure  MonthlyCharges  TotalCharges
Churn                                         
0      37.569965       61.265124   2549.911442
1      17.979133       74.441332   1531.796094

Result shows that Churn customers have more Monthly charges compared to existing customers. Also, there is a drastic difference in Tenure and Total Charges for Churn vs existing customers.

Let’s check distribution of each features with target variable.

def Numerical_distribution(df_cal,feature):
    """ Distribution of numerical variable based on target variable"""
    fig = plt.figure(figsize=(15,10))
    
    plt.subplot(2,1,1)
    ax = sns.kdeplot(df_cal[feature]
                     , color = 'g'
                     , shade = True)
    
    title_str = "Original " +feature + " Distribution"
    plt.title(title_str)
    
    plt.subplot(2,1,2)
    ax = sns.kdeplot(df_cal.loc[(df_cal['Churn']==1),feature]
                     , color = 'g'
                     , shade = True
                     , label='Chrun')    ax = sns.kdeplot(df_cal.loc[(df_cal['Churn']==0) ,feature]
                     , color = 'b'
                     , shade = True
                     , label='No chrun')
    
    title_str = feature + " Distribution: Churn vs No churn"
    plt.title(title_str)
    plt.show()

Tenure vs Churn Distribution

Numerical_distribution(df_cal,'Tenure')

MonthlyCharges vs Churn Distribution

Numerical_distribution(df_cal,'MonthlyCharges')

From distribution, churn subscribers are

more likely to leave company who’s tenure is less than a year
more likely to have more than $65 monthly charges

3. Cluster analysis

Let’s check if there is any relationship between Tenure and MonthlyCharges.

sns.lmplot(x='Tenure'
           ,y='MonthlyCharges'
           ,data=df_cal
           ,hue='Churn'
            ,fit_reg=False
            ,markers=["o", "x"]
            ,palette= plotColor)
plt.show()

From the analysis, there are some clusters based on Tenure and Monthly Charges.

Let’s apply K-means cluster algorithm to see clusters. Before passing data to K-means algorithm, need to normalize Tenure and Monthly Charges.

def Normalize_feature(feature):
    """ Return normalized features """
    return prepro.StandardScaler().fit_transform(feature)# normalized tenure and monthlychargesdf_cal['Tenure_norm'] = Normalize_feature(df_cal[['Tenure']])
df_cal['MonthlyCharges_norm'] = Normalize_feature(df_cal[['MonthlyCharges']])def Create_elbow_curve(data):
    """ Display elbow curve for K-means algo for given data """
    df_kmeans_data = data    k = range(1,10)
    kmeans = [KMeans(n_clusters=i) for i in k]    score = [kmeans[i].fit(df_kmeans_data).score(df_kmeans_data)  for i in range(len(kmeans))]    plt.figure(figsize=(10,6))
    plt.plot(k,score)
    plt.xlabel("Clusters")
    plt.ylabel("Score")
    plt.title("Elbow curve",fontsize=15)
    plt.show()# checking number of clusters

Create_elbow_curve(df_cal[df_cal.Churn==1][['Tenure_norm','MonthlyCharges_norm']])

From Elbow curve, 3 seems most efficient.

def Create_kmeans_cluster_graph(df_cal, data, n_clusters, x_title, y_title, chart_title):
    """ Display K-means cluster based on data """
    
    kmeans = KMeans(n_clusters=n_clusters # No of cluster in data
                    , random_state = random_state # Selecting same training data
                   ) 

    kmeans.fit(data)
    kmean_colors = [plotColor[c] for c in kmeans.labels_]


    fig = plt.figure(figsize=(12,8))
    plt.scatter(x= x_title + '_norm'
                , y= y_title + '_norm'
                , data=data 
                , color=kmean_colors # color of data points
                , alpha=0.25 # transparancy of data points
               )

    plt.xlabel(x_title)
    plt.ylabel(y_title)

    plt.scatter(x=kmeans.cluster_centers_[:,0]
                , y=kmeans.cluster_centers_[:,1]
                , color='black'
                , marker='X' # Marker sign for data points
                , s=100 # marker size
               )
    
    plt.title(chart_title,fontsize=15)
    plt.show()
    
    return kmeans.fit_predict(df_cal[df_cal.Churn==1][[x_title+'_norm', y_title +'_norm']])
df_cal['Cluster'] = -1 # by default set Cluster to -1
df_cal.loc[(df_cal.Churn==1),'Cluster'] = Create_kmeans_cluster_graph(df_cal
                            ,df_cal[df_cal.Churn==1][['Tenure_norm','MonthlyCharges_norm']]
                            ,3
                           ,'Tenure'
                           ,'MonthlyCharges'
                           ,"Tenure vs Monthlycharges : Churn customer cluster")

df_cal['Cluster'].unique()

Based on Monthly Charges and Tenure, there is three types of clusters.

Low Tenure and Low Monthly Charges (Blue)
Low Tenure and High Monthly Charges (Green)
High Tenure and High Monthly Charges (Red)

# Distribution of clusters
churn_distribution =  df_cal[df_cal['Churn']==1].Cluster.value_counts(normalize=True).sort_index()Generate_bar_graph( x= churn_distribution.index
                   , y = churn_distribution.values
                   , x_title = 'Clusters'
                   , y_title = 'Percentage'
                   , chart_title = 'Cluster distribution'
                  , color = plotColor)

Around 50% of churn customers belongs to Low tenure and High Monthly Charges.

When I deep drive into each group, I found some interesting findings.

Based on Demographic information,

Low Tenure and Low Monthly Charges customers

Male, Dependents

Low Tenure and High Monthly Charges customers

Senior citizens, Female

High Tenure and High Monthly Charges customers

Male, Partner, Dependents and Senior Citizen

Based on Account information,

Low Tenure and Low Monthly Charges customers

Month-to-month contract plan

Low Tenure and High Monthly Charges customers

Paperless billing, Month-to-month contract plan

High Tenure and High Monthly Charges customers

Paperless billing, One/Two year contract type

Based on Usage information,

Low Tenure and Low Monthly Charges customers

Have DSL internet service

Low Tenure and High Monthly Charges customers

Have Streaming TV / Streaming Movies, Fiber optic internet service

High Tenure and High Monthly Charges customers

Online services like Online Backup, Device Protection and Tech Support, Fiber optic internet service, Have Streaming TV / Streaming Movies

4. Churn customer prediction model

Data Preprocessing

Splitting dataset into two groups — Training & Testing

def Train_test_df(feature, target, test_size):
    """ Spliting data to train and test"""
    return train_test_split(feature
                     ,target
                     ,test_size= test_size
                     ,random_state= random_state)x_train, x_test, y_train, y_test = Train_test_df(df_model_feature
                                                 , df_model_target
                                                 , test_size = 0.2)----------------------------
Original features shape,  (7043, 28)
Original target shape,  (7043,)
x train shape,  (5634, 28)
y train shape,  (5634,)
x test shape,  (1409, 28)
y test shape,  (1409,)
----------------------------

Class imbalance issue due to inequality in Existing and Churn customers distribution

# Upsampling using SMOTE
sm = SMOTE(random_state = random_state
           , ratio = 1.0)
x_train_sm , y_train_sm = sm.fit_sample(x_train,y_train)print("----------------------------")
print("Original x train shape, ", x_train.shape)
print("Resample x train shape, ", x_train_sm.shape)
print("----------------------------")

Hyper-parameter tuning

Using GridSearchCV() method, find out best parameters for respected models

def Parameter_tunning(x, y, models, clsModelsNm, parameters, score):
    
    tuned_params = {}
    for i,model in enumerate(models):
        print(clsModelsNm[i])
        grid = GridSearchCV(estimator = model 
                            , cv = 5
                            , param_grid = parameters[clsModelsNm[i]]
                            , scoring = score
                            , n_jobs = 3)
        grid.fit(x,y)
        print(grid.best_score_)
        print(grid.best_params_)
        tuned_params[clsModelsNm[i]] = {'params':grid.best_params_}
    
    return tuned_params

Model Comparison

Comparing models like Logistic regression, Random forest & Gradient boosting using corss_val_score() method
Measuring scores like Accuracy, Precision, Recall and F1 metrics

# Graph of precision & recall against threshold
def plot_precision_recall_vs_thresold(precisions, recalls, thresholds):
    plt.plot(thresholds,precisions[:-1],label="Precision")
    plt.plot(thresholds,recalls[:-1],label="Recall")
    plt.plot(thresholds,2 * (precisions[:-1] * recalls[:-1]) / (precisions[:-1]+recalls[:-1]) ,label="F1")
    plt.title("Precision, recall & F1 vs thresold")
    plt.xlabel("Thresold")
    plt.legend(loc='lower right')
    plt.show()def Cross_validation_score(clsModels, clsModelsNm, clsSample, scoreMatrix):
    """ Cross validation using cross_val_score method """
    for i,model in enumerate(clsModels):
        print("===============================================")
        print(clsModelsNm[i])

        for j, sample in enumerate(clsSample):
            print("************************************************")
            print(sample[2])
            
            for score in scoreMatrix:
                scores = cross_val_score(model, sample[0], sample[1], cv=5 , scoring = score)
                print(score, " score:", scores.mean())
                
            y_scores = cross_val_predict(model, sample[0], sample[1], cv=5, method="predict_proba") 
            
            
            precisions, recalls, thresholds = metrics.precision_recall_curve(sample[1], y_scores[:][:,1]) 
            plot_precision_recall_vs_thresold(precisions, recalls, thresholds)
            score_matrix = pd.DataFrame({'Precisions': precisions[:-1]
                                ,'Recalls': recalls[:-1]
                                ,'F1': 2 * (precisions[:-1] * recalls[:-1]) / (precisions[:-1]+recalls[:-1])
                                ,'Threshold': thresholds
                               })
            #print("When percision and recall are same \n" , score_matrix[ score_matrix['Precisions'] == score_matrix['Recalls']] )
            print("When F1 score is max \n" , score_matrix[ score_matrix['F1'] == max(score_matrix['F1'])] )

Model Evaluation

Using Classification report & Log loss score, calculate best model for our data

def Cus_log_loss(target, predicted):
    if len(predicted) != len(target):
        print("Data object initiated")
        return
    
    target = [float(x) for x in target] # converting target into float
    predicted = [min([max([x,1e-15]), 1-1e-15]) for x in predicted]
        
    return -1.0 / len(target) *  sum([ target[i] * math.log(predicted[i]) + (1.0 - target[i]) * math.log(1.0 - predicted[i]) 
                                      for i in range(len(predicted))])def Model_evaluation(models, clsModelsNm, x_train, y_train, x_test, y_test,  threshold ):
    
    predicted_val = {}
    for i, model in enumerate(clsModelsNm):
        models[i].fit(x_train,y_train)
        predicted_proba = models[i].predict_proba(x_test)
        
        predicted = predicted_proba[:,1].copy()
        predicted[predicted >=threshold[i]] = 1
        predicted[predicted < threshold[i]] = 0
        
        confusion_matrix_matrix = metrics.confusion_matrix(y_true = y_test
                                                   ,y_pred = predicted
                                                   #,normalize = 'true' 
                                                   )
        
        print("***********",clsModelsNm[i], "*************")
        print(metrics.classification_report(y_test, predicted))
        print("*******************************************")
        #print("Log loss score", round(metrics.log_loss(y_test,models[i].predict_proba(x_test)[:,1]),2))
        print("Log loss score", round(Cus_log_loss(y_test,predicted_proba[:,1]),2))
        print("*******************************************")
        print("Confusion matrix")
        sns.heatmap(confusion_matrix_matrix
                    , annot=True
                    , fmt="d"
                   )
        plt.xlabel("Predicted label")
        plt.ylabel("Actual label")
        plt.show()
        print("*******************************************")
        
        predicted_val[model] = predicted

Model conclusion

Based on model comparison and evaluation process, up sampling data works better during training process, however not with unseen data (based on log loss score). One of the reason could be data leakage in cross_val_score step.

However, log loss score for original dataset remains same with training dataset as well as testing dataset.

From above analysis, gradient boosting with original dataset has stable and best score. So, I have used it for prediction process.

Gradient boosting model suggested important features like

Total charges, Tenure, Monthly charges, Contract type, Payment method, Internet service type, Paperless billing

Most of them, we already analyzed during our EDA process.

5. Retention plan

Since we generated a model based on Churn and Existing customers, which help to classify both of them. Now we can use same model on existing customers to find the probability of churn.

existing_customer_churn_prob = clsGB.predict_proba(existing_cust_feature)
existing_cust.loc[existing_cust['Churn_proba'] >= 0.8 , 'Risk_type'] = 'Very high'
existing_cust.loc[(existing_cust['Churn_proba'] >= 0.6) & (existing_cust['Churn_proba'] < 0.8)  , 'Risk_type'] = 'High'
existing_cust.loc[(existing_cust['Churn_proba'] >= 0.4) & (existing_cust['Churn_proba'] < 0.6) , 'Risk_type'] = 'Medium'
existing_cust.loc[(existing_cust['Churn_proba'] >= 0.2) & (existing_cust['Churn_proba'] < 0.4) , 'Risk_type'] = 'Low'
existing_cust.loc[(existing_cust['Churn_proba'] > 0.0) & (existing_cust['Churn_proba'] < 0.2) , 'Risk_type'] = 'Very low'

Distribution of Existing customer by risk type

existing_cust['Risk_type'].value_counts().plot(kind = 'barh')
plt.title("Existing customer risk type distribution", fontsize=14)
plt.ylabel("Risk type", fontsize = 13)
plt.xlabel("Customers", fontsize = 13)

Once, we determine very high/high churn probability customers, we can apply proper retention plans.

Conclusion

In this project, I have tried to divide customer churn prediction problem into steps like exploration, profiling, clustering, model selection & evaluation and retention plans. Based on this analysis, we can help retention team to analyze high risk churn customers before they leave the company.

Moreover, we can aggregate different data sources like customer inquires, seasonality in sales, more demographic information to make our prediction more accurate.

Telecom Churn Prediction

Table of contents

1) Introduction

2) Descriptive analysis and EDA (Exploratory Data Analysis)

3) Cluster analysis

4) Churn customer prediction model

5) Retention plan

Source code — Notebook

1. Introduction

Dataset, Features and Target value

Problem Description

Why customers are leaving the company?

How to detect the churn customers?

How to prevent customers from leaving a company?

2. Descriptive analysis and EDA (Exploratory Data Analysis)

Calculate Churn Rate

Correlation between features

Distribution of Categorical and Binary variables by target (Churn vs not churn)

Profile of Churn vs Existing customers based on above analysis

Distribution of Tenure, Monthly Charges and Total Charges

3. Cluster analysis

4. Churn customer prediction model

Data Preprocessing

Hyper-parameter tuning

Model Comparison

Model Evaluation

Model conclusion

5. Retention plan

Distribution of Existing customer by risk type

Conclusion

Written by Shivali