Telecom Churn Prediction

Shivali
Analytics Vidhya
Published in
11 min readApr 6, 2020
Photo credit: Superoffice.com

Customers are the most important resources for any companies or businesses. What if these customers leave the company due to high charges, better competitor offers, poor customer services or something unknown? Hence, Customer churn rate is one of the important metrics for companies to evaluate their performance.

Customer churn rate is a KPI to understand the leaving customers. Churn rate represents the percentage of customers that company lost over all the customers at the beginning of the interval.

For example, If company had 400 customers at the beginning of the month and only 360 at the end of the month, means company’s churn rate is 10%, because company lost 10% of the customer base. Companies always try to minimize the churn rate to as close as 0%.

Table of contents

1) Introduction

  • Dataset, features and target value
  • Problem description

2) Descriptive analysis and EDA (Exploratory Data Analysis)

  • Churn rate and Correlation between features
  • Profile of Churn vs Existing customers
  • Tenure and Monthly charges distribution

3) Cluster analysis

  • Churn cluster detection
  • Churn customer cluster analysis — by Demographic, Account type and Service Usage

4) Churn customer prediction model

  • Prediction model process
  • Model evaluation

5) Retention plan

Source code — Notebook

Link — Github

1. Introduction

Dataset, Features and Target value

Source : https://www.kaggle.com/blastchar/telco-customer-churn ( IBM Sample dataset)

Here, IBM provided customer data for Telecom industry to predict churn customer based on demographic, usage and account based information. Main objective here is to analyze churn customers’ behavior and develop strategies to increase customer retention.

Assumption — Here, data source has not provided any information related to time; So I have assumed that all the records are specific to the particular month.

Dataset has information related to,

Demographic:

  • Gender — Male / Female
  • Age range — In terms of Partner, Dependent and Senior Citizen

Services:

  • Phone service — If customer has Phone service, then services related to Phone like Multi-line Phone service
  • Internet Service — If customer has Internet service, then services related to Internet like Online security, Online backup, Device protection, Tech support, Streaming TV, Streaming Movies

Account type:

  • Tenure — How long customer is with the company?
  • Contract type — What kind of contract they have with a company? Like Monthly bases, On going bases — If on going bases, then One month contract or Two year contract
  • Paperless billing — Customer is paperless billion option or not?
  • Payment method — What kind of payment method customer has? Mailed check, Electronic check, Credit card (Automatic), Bank transfer (Automatic)

Usage:

  • Monthly charges
  • Total charges

Target:

  • Churn — Whether customer left the company or still with the company?

Problem Description

Why customers are leaving the company?

The reasons behind the customer leaving company could be

  • High charges
  • Better offer from competitor
  • Poor customer service
  • Some unknown reasons

How to detect the churn customers?

  • Monitoring usage
  • Analyzing complains
  • Analyzing competitors offers

How to prevent customers from leaving a company?

Once you detect high risk customers, apply

  • Retention plans
  • Improve customer service

2. Descriptive analysis and EDA (Exploratory Data Analysis)

Calculate Churn Rate

Churn rate = # of Churn customers / # of total customers

Churn_rate = df_cal['Churn'].value_counts() / df_cal.shape[0]
Generate_bar_graph(Churn_rate.index.map({0:'Existing',1:"Churn"})
, Churn_rate.values
, 'Customers'
, 'Percentage'
, 'Customer Distribution')
print(Churn_rate)

Analysis shows that Churn rate of the Telecom company is around 26%.

Correlation between features

def Generate_heatmap_graph(corr, chart_title, mask_uppertri=False ):
""" Based on features , generate correlation matrix """
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = mask_uppertri
fig,ax = plt.subplots(figsize=(12,12))
sns.heatmap(corr
, mask = mask
, square = True
, annot = True
, annot_kws={'size': 10.5, 'weight' : 'bold'}
, cmap=plt.get_cmap("YlOrBr")
, linewidths=.1)
plt.title(chart_title, fontsize=14)
plt.show()
var_corr = round(df_cal.corr(),2)
Generate_heatmap_graph(var_corr
,chart_title = 'Correlation Heatmap'
,mask_uppertri = True)

From correlation matrix, features like Tenure, Monthly charges and Total charges are highly correlated with services like Multiple Phone Lines services and Internet services like Online Security, Online Backup, Device Protection, Tech Support, Streaming TV and Streaming Movies services.

Distribution of Categorical and Binary variables by target (Churn vs not churn)

def Create_data_label(ax):
""" Display data label for given axis """
for bar in ax.patches:
ax.text(bar.get_x() + bar.get_width()/ 2
, bar.get_height() + 0.01
, str(round(100 * bar.get_height(),2)) + '%'
, ha = 'center'
, fontsize = 13)


def Categorical_var_churn_dist(data, cols, distribution_col):
""" Distribution of categorical variable based on target variable """

for i,feature in enumerate(cols):

feature_summary = data[feature].value_counts(normalize=True).reset_index(name='Percentage')

plt_cat = sns.catplot(x=feature
, y='Percentage'
, data = feature_summary
, col=distribution_col
, kind='bar'
, aspect = 0.8
, palette = plotColor
, alpha = 0.6)

if feature == 'PaymentMethod':
plt_cat.set_xticklabels(rotation= 65, horizontalalignment = 'right')


for ax1, ax2 in plt_cat.axes:
Create_data_label(ax1)
Create_data_label(ax2)


plt.ylim(top=1)
plt.subplots_adjust(top = 0.9)
plt.gcf().suptitle(feature+" distribution",fontsize=14)
plt.show()
churn_summary = df_cal.groupby('Churn')
Categorical_var_churn_dist(churn_summary, cat_cols,'Churn')

Profile of Churn vs Existing customers based on above analysis

Churn customers are likely to

  • not have Partner and Dependents; Meaning likely to be single.
  • have Internet service and specifically Fiber optics
  • not have online security service, online backup service, device protection service, Tech support service
  • have streaming TV and streaming Movies services
  • be with monthly based plan
  • have paperless billing service
  • have electronic check payment method

Distribution of Tenure, Monthly Charges and Total Charges

# Mean summary of customers (Churn vs Non churn)
print(churn_summary['Tenure','MonthlyCharges','TotalCharges'].mean())
Tenure MonthlyCharges TotalCharges
Churn
0 37.569965 61.265124 2549.911442
1 17.979133 74.441332 1531.796094

Result shows that Churn customers have more Monthly charges compared to existing customers. Also, there is a drastic difference in Tenure and Total Charges for Churn vs existing customers.

Let’s check distribution of each features with target variable.

def Numerical_distribution(df_cal,feature):
""" Distribution of numerical variable based on target variable"""
fig = plt.figure(figsize=(15,10))

plt.subplot(2,1,1)
ax = sns.kdeplot(df_cal[feature]
, color = 'g'
, shade = True)

title_str = "Original " +feature + " Distribution"
plt.title(title_str)

plt.subplot(2,1,2)
ax = sns.kdeplot(df_cal.loc[(df_cal['Churn']==1),feature]
, color = 'g'
, shade = True
, label='Chrun')
ax = sns.kdeplot(df_cal.loc[(df_cal['Churn']==0) ,feature]
, color = 'b'
, shade = True
, label='No chrun')

title_str = feature + " Distribution: Churn vs No churn"
plt.title(title_str)
plt.show()

Tenure vs Churn Distribution

Numerical_distribution(df_cal,'Tenure')

MonthlyCharges vs Churn Distribution

Numerical_distribution(df_cal,'MonthlyCharges')

From distribution, churn subscribers are

  • more likely to leave company who’s tenure is less than a year
  • more likely to have more than $65 monthly charges

3. Cluster analysis

Let’s check if there is any relationship between Tenure and MonthlyCharges.

sns.lmplot(x='Tenure'
,y='MonthlyCharges'
,data=df_cal
,hue='Churn'
,fit_reg=False
,markers=["o", "x"]
,palette= plotColor)
plt.show()

From the analysis, there are some clusters based on Tenure and Monthly Charges.

Let’s apply K-means cluster algorithm to see clusters. Before passing data to K-means algorithm, need to normalize Tenure and Monthly Charges.

def Normalize_feature(feature):
""" Return normalized features """
return prepro.StandardScaler().fit_transform(feature)
# normalized tenure and monthlychargesdf_cal['Tenure_norm'] = Normalize_feature(df_cal[['Tenure']])
df_cal['MonthlyCharges_norm'] = Normalize_feature(df_cal[['MonthlyCharges']])
def Create_elbow_curve(data):
""" Display elbow curve for K-means algo for given data """
df_kmeans_data = data
k = range(1,10)
kmeans = [KMeans(n_clusters=i) for i in k]
score = [kmeans[i].fit(df_kmeans_data).score(df_kmeans_data) for i in range(len(kmeans))] plt.figure(figsize=(10,6))
plt.plot(k,score)
plt.xlabel("Clusters")
plt.ylabel("Score")
plt.title("Elbow curve",fontsize=15)
plt.show()
# checking number of clusters

Create_elbow_curve(df_cal[df_cal.Churn==1][['Tenure_norm','MonthlyCharges_norm']])

From Elbow curve, 3 seems most efficient.

def Create_kmeans_cluster_graph(df_cal, data, n_clusters, x_title, y_title, chart_title):
""" Display K-means cluster based on data """

kmeans = KMeans(n_clusters=n_clusters # No of cluster in data
, random_state = random_state # Selecting same training data
)

kmeans.fit(data)
kmean_colors = [plotColor[c] for c in kmeans.labels_]


fig = plt.figure(figsize=(12,8))
plt.scatter(x= x_title + '_norm'
, y= y_title + '_norm'
, data=data
, color=kmean_colors # color of data points
, alpha=0.25 # transparancy of data points
)

plt.xlabel(x_title)
plt.ylabel(y_title)

plt.scatter(x=kmeans.cluster_centers_[:,0]
, y=kmeans.cluster_centers_[:,1]
, color='black'
, marker='X' # Marker sign for data points
, s=100 # marker size
)

plt.title(chart_title,fontsize=15)
plt.show()

return kmeans.fit_predict(df_cal[df_cal.Churn==1][[x_title+'_norm', y_title +'_norm']])

df_cal['Cluster'] = -1 # by default set Cluster to -1
df_cal.loc[(df_cal.Churn==1),'Cluster'] = Create_kmeans_cluster_graph(df_cal
,df_cal[df_cal.Churn==1][['Tenure_norm','MonthlyCharges_norm']]
,3
,'Tenure'
,'MonthlyCharges'
,"Tenure vs Monthlycharges : Churn customer cluster")

df_cal['Cluster'].unique()

Based on Monthly Charges and Tenure, there is three types of clusters.

  • Low Tenure and Low Monthly Charges (Blue)
  • Low Tenure and High Monthly Charges (Green)
  • High Tenure and High Monthly Charges (Red)
# Distribution of clusters
churn_distribution = df_cal[df_cal['Churn']==1].Cluster.value_counts(normalize=True).sort_index()
Generate_bar_graph( x= churn_distribution.index
, y = churn_distribution.values
, x_title = 'Clusters'
, y_title = 'Percentage'
, chart_title = 'Cluster distribution'
, color = plotColor)

Around 50% of churn customers belongs to Low tenure and High Monthly Charges.

When I deep drive into each group, I found some interesting findings.

Based on Demographic information,

Low Tenure and Low Monthly Charges customers

  • Male, Dependents

Low Tenure and High Monthly Charges customers

  • Senior citizens, Female

High Tenure and High Monthly Charges customers

  • Male, Partner, Dependents and Senior Citizen

Based on Account information,

Low Tenure and Low Monthly Charges customers

  • Month-to-month contract plan

Low Tenure and High Monthly Charges customers

  • Paperless billing, Month-to-month contract plan

High Tenure and High Monthly Charges customers

  • Paperless billing, One/Two year contract type

Based on Usage information,

Low Tenure and Low Monthly Charges customers

  • Have DSL internet service

Low Tenure and High Monthly Charges customers

  • Have Streaming TV / Streaming Movies, Fiber optic internet service

High Tenure and High Monthly Charges customers

  • Online services like Online Backup, Device Protection and Tech Support, Fiber optic internet service, Have Streaming TV / Streaming Movies

4. Churn customer prediction model

Data Preprocessing

  • Splitting dataset into two groups — Training & Testing
def Train_test_df(feature, target, test_size):
""" Spliting data to train and test"""
return train_test_split(feature
,target
,test_size= test_size
,random_state= random_state)
x_train, x_test, y_train, y_test = Train_test_df(df_model_feature
, df_model_target
, test_size = 0.2)
----------------------------
Original features shape, (7043, 28)
Original target shape, (7043,)
x train shape, (5634, 28)
y train shape, (5634,)
x test shape, (1409, 28)
y test shape, (1409,)
----------------------------
  • Class imbalance issue due to inequality in Existing and Churn customers distribution
# Upsampling using SMOTE
sm = SMOTE(random_state = random_state
, ratio = 1.0)
x_train_sm , y_train_sm = sm.fit_sample(x_train,y_train)
print("----------------------------")
print("Original x train shape, ", x_train.shape)
print("Resample x train shape, ", x_train_sm.shape)
print("----------------------------")

Hyper-parameter tuning

  • Using GridSearchCV() method, find out best parameters for respected models
def Parameter_tunning(x, y, models, clsModelsNm, parameters, score):

tuned_params = {}
for i,model in enumerate(models):
print(clsModelsNm[i])
grid = GridSearchCV(estimator = model
, cv = 5
, param_grid = parameters[clsModelsNm[i]]
, scoring = score
, n_jobs = 3)
grid.fit(x,y)
print(grid.best_score_)
print(grid.best_params_)
tuned_params[clsModelsNm[i]] = {'params':grid.best_params_}

return tuned_params

Model Comparison

  • Comparing models like Logistic regression, Random forest & Gradient boosting using corss_val_score() method
  • Measuring scores like Accuracy, Precision, Recall and F1 metrics
# Graph of precision & recall against threshold
def plot_precision_recall_vs_thresold(precisions, recalls, thresholds):
plt.plot(thresholds,precisions[:-1],label="Precision")
plt.plot(thresholds,recalls[:-1],label="Recall")
plt.plot(thresholds,2 * (precisions[:-1] * recalls[:-1]) / (precisions[:-1]+recalls[:-1]) ,label="F1")
plt.title("Precision, recall & F1 vs thresold")
plt.xlabel("Thresold")
plt.legend(loc='lower right')
plt.show()
def Cross_validation_score(clsModels, clsModelsNm, clsSample, scoreMatrix):
""" Cross validation using cross_val_score method """
for i,model in enumerate(clsModels):
print("===============================================")
print(clsModelsNm[i])

for j, sample in enumerate(clsSample):
print("************************************************")
print(sample[2])

for score in scoreMatrix:
scores = cross_val_score(model, sample[0], sample[1], cv=5 , scoring = score)
print(score, " score:", scores.mean())

y_scores = cross_val_predict(model, sample[0], sample[1], cv=5, method="predict_proba")


precisions, recalls, thresholds = metrics.precision_recall_curve(sample[1], y_scores[:][:,1])
plot_precision_recall_vs_thresold(precisions, recalls, thresholds)
score_matrix = pd.DataFrame({'Precisions': precisions[:-1]
,'Recalls': recalls[:-1]
,'F1': 2 * (precisions[:-1] * recalls[:-1]) / (precisions[:-1]+recalls[:-1])
,'Threshold': thresholds
})
#print("When percision and recall are same \n" , score_matrix[ score_matrix['Precisions'] == score_matrix['Recalls']] )
print("When F1 score is max \n" , score_matrix[ score_matrix['F1'] == max(score_matrix['F1'])] )

Model Evaluation

  • Using Classification report & Log loss score, calculate best model for our data
def Cus_log_loss(target, predicted):
if len(predicted) != len(target):
print("Data object initiated")
return

target = [float(x) for x in target] # converting target into float
predicted = [min([max([x,1e-15]), 1-1e-15]) for x in predicted]

return -1.0 / len(target) * sum([ target[i] * math.log(predicted[i]) + (1.0 - target[i]) * math.log(1.0 - predicted[i])
for i in range(len(predicted))])
def Model_evaluation(models, clsModelsNm, x_train, y_train, x_test, y_test, threshold ):

predicted_val = {}
for i, model in enumerate(clsModelsNm):
models[i].fit(x_train,y_train)
predicted_proba = models[i].predict_proba(x_test)

predicted = predicted_proba[:,1].copy()
predicted[predicted >=threshold[i]] = 1
predicted[predicted < threshold[i]] = 0

confusion_matrix_matrix = metrics.confusion_matrix(y_true = y_test
,y_pred = predicted
#,normalize = 'true'
)

print("***********",clsModelsNm[i], "*************")
print(metrics.classification_report(y_test, predicted))
print("*******************************************")
#print("Log loss score", round(metrics.log_loss(y_test,models[i].predict_proba(x_test)[:,1]),2))
print("Log loss score", round(Cus_log_loss(y_test,predicted_proba[:,1]),2))
print("*******************************************")
print("Confusion matrix")
sns.heatmap(confusion_matrix_matrix
, annot=True
, fmt="d"
)
plt.xlabel("Predicted label")
plt.ylabel("Actual label")
plt.show()
print("*******************************************")

predicted_val[model] = predicted

Model conclusion

Based on model comparison and evaluation process, up sampling data works better during training process, however not with unseen data (based on log loss score). One of the reason could be data leakage in cross_val_score step.

However, log loss score for original dataset remains same with training dataset as well as testing dataset.

From above analysis, gradient boosting with original dataset has stable and best score. So, I have used it for prediction process.

Gradient boosting model suggested important features like

  • Total charges, Tenure, Monthly charges, Contract type, Payment method, Internet service type, Paperless billing

Most of them, we already analyzed during our EDA process.

5. Retention plan

Since we generated a model based on Churn and Existing customers, which help to classify both of them. Now we can use same model on existing customers to find the probability of churn.

existing_customer_churn_prob = clsGB.predict_proba(existing_cust_feature)
existing_cust.loc[existing_cust['Churn_proba'] >= 0.8 , 'Risk_type'] = 'Very high'
existing_cust.loc[(existing_cust['Churn_proba'] >= 0.6) & (existing_cust['Churn_proba'] < 0.8) , 'Risk_type'] = 'High'
existing_cust.loc[(existing_cust['Churn_proba'] >= 0.4) & (existing_cust['Churn_proba'] < 0.6) , 'Risk_type'] = 'Medium'
existing_cust.loc[(existing_cust['Churn_proba'] >= 0.2) & (existing_cust['Churn_proba'] < 0.4) , 'Risk_type'] = 'Low'
existing_cust.loc[(existing_cust['Churn_proba'] > 0.0) & (existing_cust['Churn_proba'] < 0.2) , 'Risk_type'] = 'Very low'

Distribution of Existing customer by risk type

existing_cust['Risk_type'].value_counts().plot(kind = 'barh')
plt.title("Existing customer risk type distribution", fontsize=14)
plt.ylabel("Risk type", fontsize = 13)
plt.xlabel("Customers", fontsize = 13)

Once, we determine very high/high churn probability customers, we can apply proper retention plans.

Conclusion

In this project, I have tried to divide customer churn prediction problem into steps like exploration, profiling, clustering, model selection & evaluation and retention plans. Based on this analysis, we can help retention team to analyze high risk churn customers before they leave the company.

Moreover, we can aggregate different data sources like customer inquires, seasonality in sales, more demographic information to make our prediction more accurate.

--

--

Shivali
Analytics Vidhya

Data Analyst/Scientist enthusiast | Customer Insights | Market Analysis | Product analysis