Customer Churn Prediction for Telecommunication Company With Decision Tree Using Python.

Emphasizing customer retention as much as exploring new potential customers ensuring business sustainability and growth. Building a churn predicting model to identify underlying churning factors and customers inclining to leave. Choosing the right evaluation metrics helps to build more practically useful model.

Hshan.T
The Startup
5 min readDec 6, 2020

--

Telecommunication industry has been showing exponential growth in line with rising demand following technology advancement. The competitions among services providers are so fierce that they are executing different strategies to meet the customers’ needs. Effort in retaining existing customers is now as important as searching for new customers.

Exploratory Analysis

The dataset has 7032 instances and 21 columns, comprised of ID information, 3 numerical attributes, 16 categorical attributes and target (‘Churn’) column. There is no missing value.

Figure 1: Bar chart of target variable, class labels

A bar chart is plotted for target column and are found distributed unevenly. This raises concern on appropriateness of accuracy as performance evaluation metrics. If the model simply predicts all ‘No’, the model is able to attain accuracy above 70%. Model developed should at least achieving accuracy above 75%.

Figure 2: Heatmap of correlation between numeric variables.
Figure 3: Matrix of Scatter Plots.

Heatmap on Figure 2, ‘TotalCharges’ is highly positive correlated with ‘tenure’ but comparatively weaker correlated with ‘MonthlyCharges’. Further analysis is needed to decide whether to omit any attributes. Figure 3, the scatterplots for both classes are overlapping a large area, it is rather difficult to find relationship and distinguishing characteristics determining the propensity to leave.

Figure 4.1 : Boxplots with for tenure.
Figure 4.2 : Boxplots with for TotalCharges.

Figure 4, there are outliers spotted from boxplots of ‘TotalCharges’ and ‘tenure’ for churned customers. Since ~5% of churned samples have outliers, they are not removed but capped at maximum (3rd quantile + 1.5*Interquantile). As for categorical attributes, chi-square test is used to determine if there is relationship between categorical attributes and classes.

Figure 5.1: Count plots of gender.
Figure 5.2: Count plots of PhoneService.

Preprocessing

The remaining attributes after omitting those failed the chi-square testing in previous section are further processed. This illustration is working to solve classification problem predicting if a customer churns using decision tree model. Since the model is non-parametric, making no distribution assumption on data input, no standardization step is performed on numerical variables. One-hot encoding is more suitable for processing categorical attributes, as tree is splitting by position of variable value around threshold. It will treat label-encoded data as ordinal. The preprocessed data frame now has dimension of (7032, 28).

Decision Tree

Models are all trained with multi combinations of hyperparameters through optimization method using 7 folds cross validated grid search, to find set of hyperparameters that performs the best and is computational efficient by removing those do not contribute improvement.

Figure 6: Confusion matrix for decision tree model.
Figure 7: Final features selection (Decision Tree).
Figure 8: Resulting Decision Tree.
Example of rules extracted.

The best decision tree model for this task is attaining the accuracy of 0.86, recall of 0.68, precision of 0.79 and f1-score of 0.73.

Evaluation and Conclusion

The confusion matrix is plotted to display the performance of the binary classifier, but it is not significant as comparisons between models if you are experimenting different types of models. As a consequence, answer to question about evaluation metrics is crucial for selecting the best model for churn prediction. The uneven distribution of label has induced bias in accuracy. In this case, the most commonly use metrics such as recall, precision and f1-score are worth for consideration, depending on the company requirements or appeals. There is always trade off when we try to maximize the performance, precision drops as recall increases.

The metrics should be selected on basis of cost function associated. In this report, cost in term of workforce and financial in effort of retaining potential churn customers is assumed to be higher than cost of losing customers, hence emphasizing precision due to the higher cost on false positive. We predict non-churn customer as churn (False Positive) and offering them the promotional price or free upgrade will impose higher cost than classifying a churn customer as non-churn (False Negative), taking no action and losing the revenue.

--

--