Customer retention (Part 1 of 2): Understanding the fundamentals

Published in

Data Science at Microsoft

12 min readJan 30, 2024

Customers are the basis for the revenue and success of any business, so building a strong Customer Relationship Management (CRM) system is a crucial topic for many organizations. CRM is the process of understanding customer behavior in ways that support an organization to improve customer acquisition, retention, and profitability. Because the acquisition costs of a new customer can be much higher than the costs of retaining an existing one, management and marketing services pay a lot of attention to customer retention [1, 2]. In addition, retained customers can be of great help to an organization by sharing their positive experience [3], which tends to reduce the marketing costs of acquiring new customers [4]. As a result of all these factors, retaining customers is important for building a sustainable and profitable business.

A powerful technique to analyze and improve retention rate is churn prediction, a technique that helps to find out which customers are more likely to churn (i.e., stop doing business) in a given period. Increasing competition has pushed organizations to focus not only on the systematic prediction of churning customers but also on the causes of churning behavior [5]. Knowing the reasons for customer churn provides support for creating a profile of customers who are at risk of churning and also helps to foster effective proactive customer retention campaigns [6].

In our team at Microsoft, we often encounter business customer churn prediction use cases. Most churn models are quite similar in their problem-solving approach and share multiple features among them. To address the challenges we face, reduce development time, enable easier maintenance, deal with migration challenges, and avoid wasting resources, we developed Customer Retention as a Service (CRaaS) — a unified and generic framework for both churn features and model building.

CRaaS can be made extensible to new customer retention tasks with minimum effort. This service leverages multiple data sources and provides multiple alternatives within a Machine Learning churn analysis pipeline. CRaaS produces completely automated churn predictions, determines the causes of churning behavior, and generates churn explanatory text that marketing teams can leverage to reduce customer churn and improve retention rate. In addition, this service can be tuned easily, reduces delivery time, enables quality monitoring of large volumes of data, amortizes costs, reduces maintenance efforts, and optimizes resources.

The concept of churn exhibits diverse interpretations across various industries, marked by the overarching definition of an extended period of customer inactivity [8]. However, the criteria delineating what constitutes more brief customer inactivity and what qualifies as prolonged customer inactivity are different according to each research domain. This variability often stems from the contemporary landscape of service provision, especially in sectors such as the internet and retail, where competition has spurred the adoption of flexible subscription models. Within these modern service contexts, customer churn arises prominently due to the minimal investment costs for customers [9,10,11], a consequence of the low switching costs associated with changing service providers [12].

From an operational standpoint, the identification of churn entails the initial construction of specific criteria against which customers are categorized as churned or not. This categorization hinges on tracking changes in customer behavior, with the moment of such change serving as a pivotal reference point [11]. Once the duration of inactivity or behavioral alteration surpasses a predetermined threshold, the customer is deemed to have churned, and this threshold, known as the time window, plays a pivotal role in the churn definition. This approach facilitates the prediction of the likelihood of customers churning within a defined timeframe. The methodology of employing time windows finds prevalent usage in the analysis of customer activity logs, where periods of non-usage signify churn events. Notably, the criteria for establishing time windows can vary across distinct service features [13].

CRaaS offers users the flexibility to define multiple churn prediction windows and specify the amount of historical data they wish to utilize for each of these prediction windows. Based on their unique needs, users can also use CRaaS to establish a minimum historical data requirement for customers, ensuring that adequate data is available to make informed assessments of customer behavior.

The determination of the threshold to use for the time window, which is used to identify churn customers and investigate changes in activity or behavior, is influenced by both the inherent nature of the data and the specific use cases addressed. This is elaborated in the next section, as part of our labeling mechanism description.

Customer Retention as a Service

The increasing diversity and competition among organizations has encouraged businesses to use CRM systems that would allow them to acquire new customers, establish a continuous relationship with them, and increase their retention for more profitability. CRM systems can leverage Machine Learning (ML) churn models to analyze customers’ personal and behavioral data, providing organizations with a competitive edge by increasing customer retention rates. These predictive models identify customers who are likely to churn and elucidate the reasons for their potential churn. These predictions are used to design marketing strategies and service offerings [7]. Because most of the churn models for business use cases that we have encountered within Microsoft have similarities, instead of developing separate feature engineering and modeling code for each churn model we proposed CRaaS as a unified and generic framework for both churn features and model building. The prominent advantages of using a CRaaS instead of a separate Machine Learning churn analysis pipeline for each business use case include the following:

Reduced delivery time: By leveraging multiple data sources and providing multiple alternatives within a Machine Learning churn analysis pipeline, CRaaS allows teams to get down to business faster and avoid longer development cycles. Data scientists can significantly reduce the effort required in developing and testing each stage in the Machine Learning churn analysis pipeline, tasks that are very time consuming and sensitive to error propagation.
Optimized resources: CRaaS avoids production pipeline redundancy by automating one central pipeline, rapidly collecting and processing a huge amount of data from multiple sources and producing metrics at a specific granularity level. By consolidating multiple pipelines producing the same metrics at the same grain into one, CRaaS avoids wasting resources and reduces failures for the production support team.
Quality monitoring of large volumes of data: Feature engineering can be error prone (in terms of data accuracy). Therefore, we added to CRaaS some data validation checks on all datasets to validate basic metrics, such as count and others, to ensure data accuracy. In the business use cases we encountered churn models with a high percentage of common features. Adding validation checks reduces the need for data quality monitoring efforts.
Amortized maintenance efforts and Costs: CRaaS provides multiple alternatives within one main Machine Learning churn analysis pipeline. Therefore, it saves efforts on feature engineering and model building, data quality and model run monitoring, preventing failures for the production support team, data platform migration efforts, and compliance and security work, among other areas. In addition, by amortizing maintenance efforts, CRaaS avoids the types of repetitive costs associated with hosting models in different engineering environments.

Feature modification

Raw data logs contain the complete and actual set of information about customer actions. Churn prediction, however, is what is needed to process raw log data with greater accuracy. That’s because service-related indicators typically exhibit sparsity, numerous outliers, and a skewed data distribution. Consequently, it becomes imperative to employ suitable feature engineering techniques for churn prediction [14]. These techniques encompass methods like feature encoding, outlier detection, and sampling, which are essential for addressing imbalanced data and extracting vital insights from the raw log data to enhance the accuracy of churn predictions.

To offer our end users greater flexibility in their feature modification process, we have incorporated in CRaaS various options to choose from that we selected based on their popularity in the existing literature or their efficiency in consideration of the constraints of the business cases we encountered. Among these options are various sampling methods, including Synthetic Minority Oversampling Technique (SMOTE) [15] and Adaptative Synthetic Sampling (ADASYN) [16], as well as feature encoding techniques such as One-Hot Encoding and Target Encoding.

Churn prediction models

Research on customer churn spans various service sectors, with efforts focused on identifying or predicting the likelihood of customer churn using diverse indicators. The significance and intuitive relevance of churn have led to its integration into different service domains, adapting to each field’s specific characteristics. Consequently, churn analysis research has become fragmented, resulting in different measurement criteria for each research domain.

Churn prediction models can be categorized into four main domains: traditional Machine Learning, statistical approaches, graph theory, and Deep Learning ML techniques. In comparison to statistical methods, Deep Learning ML techniques excel at capturing robust non-linear relationships among features and handling heterogeneous effects arising from diverse feature sets. Graph theory, on the other hand, treats churn as a mathematical relationship, configures graph attributes by feature and by customer and expresses their relationship as edges. Once a graph is built, churning customers are identified through graph correlation analysis. Deep Learning methods, particularly in scenarios with sparse customer data, involve training via data that has been densified or by employing fully connected neural networks generated by extracting latent vectors from autoencoders and combining these features with static attributes [17].

A literature review reveals that, in most business domains, traditional Machine Learning models are more prevalent [17]. Several studies have demonstrated the effectiveness of traditional Machine Learning algorithms such as XGBoost [18,19,20,21], which, in some instances, have even outperformed Neural Networks (NN) [22,23,24,25,26,27]. Businesses with abundant log data and easy access to customer information, like those in the gaming and telecommunications industries, tend to leverage Deep Learning techniques. These variations can be attributed mainly to differences in data size, types, and data lifecycle among different business sectors [17].

Performance evaluation

Typically, the performance evaluation of churn ML model predictions relies on metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve and Cumulative Gain Curve and classification report. The ROC curve is constructed by plotting sensitivity values on the y-axis against the false positive rate on the x-axis. It serves as a robust evaluation criterion, offering an assessment of classifiers that is independent of class distribution and misclassification error costs. In the context of churn, the x-axis represents the proportion of non-churn cases incorrectly classified as churn, while the y-axis signifies the portion of churn cases that were correctly classified [30,31]. Consequently, an AUC value approaching 1 indicates that the churn prediction model effectively discriminates between the characteristics of churn and active customers [32,33,34,35].

In addition, some churn studies have often used the Cumulative Gain Curve as a graphical representation of the performance of a predictive model, and typically deployed in marketing. The Cumulative Gain Curve is created by ranking the predicted probabilities or scores produced by the model for a binary classification task and then plotting the cumulative proportion of positive outcomes (e.g., churning customers) against the cumulative proportion of the total sample. This curve helps visualize how well a model is at identifying positive cases and how much improvement it provides compared to a random selection, for example [36,37].

The Cumulative Gain Curve can help a marketing team make decisions about how to allocate resources for customer retention efforts. For example, it can guide them in determining how many of the top-ranked customers the team should reach to maximize the number of churn cases identified. In churn analysis, you may also want to consider other evaluation metrics such as precision, recall, and F1 score to get a more comprehensive view of your model’s performance.

Conclusion

In this article, we reviewed the fundamentals of customers retention and discussed the main domains of churn prediction ML models and poplar performance evaluation metrics.

In the next article in this two-part series, we discuss details of the CRaaS methodology, including how Microsoft Azure can be used in the feature engineering stage and wrapped the framework as a REST endpoint. In addition, we will shared an evaluation results by leveraging real customer data. We hope this series provides you with guidelines to help you conquer your own business problems.

We’d like to thank Casey Doyle for helping review the work.

See the next article in this two-part series:

Customer retention (Part 2 of 2): Framework architecture

By Yasmin Bokobza, Sharath Kumar Rangappa, Swarnim Narayan, and Kiran R

medium.com

References

1. Reinartz WJ, Kumar V (2003). The impact of customer relationship characteristics on profitable lifetime duration. Journal of marketing 67(1):77–99.

2. Yang Z, Peterson RT (2004). Customer perceived value, satisfaction, and loyalty: The role of switching costs. Psychology & Marketing 21(10):799–822.

3. Reichheld FF, Sasser WE (1990). Zero defections: Quality comes to services. Harvard business review 68(5):105–111.

4. Bolton RN, Bronkhorst TM (1995). The relationship between customer complaints to the firm and subsequent exit behavior. ACR North American Advances 22:94–100.

5. Geiler, L., Affeldt, S., & Nadif, M. (2022). A survey on machine learning methods for churn prediction. International Journal of Data Science and Analytics, 1–26.‏

6. Leung CK, Pazdor AG, Souza J (2021). Explainable artificial intelligence for data science on customer churn. In: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), IEEE, pp 1–10.

7. Sabbeh, S. F. (2018). Machine-learning techniques for customer retention: A comparative study. International Journal of Advanced Computer Science and Applications, 9(2).‏

8. Periáñez, Á., Saas, A., Guitart, A., & Magne, C. (2016, October). Churn prediction in mobile social games: Towards a complete assessment using survival ensembles. In 2016 IEEE international conference on data science and advanced analytics (DSAA) (pp. 564–573). IEEE.‏

9. Ma, S. (2009). On optimal time for customer retention in non-contractual setting. Available at SSRN 1529284.‏

10. Tamaddoni Jahromi, A., Sepehri, M. M., Teimourpour, B., & Choobdar, S. (2010). Modeling customer churn in a non-contractual setting: the case of telecommunications service providers. Journal of Strategic Marketing, 18(7), 587–598.‏

11. Buckinx, W., & Van den Poel, D. (2005). Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting. European journal of operational research, 164(1), 252–268.‏

12. Lejeune, M. A. (2001). Measuring the impact of data mining on churn management. Internet Research, 11(5), 375–387.‏

13. Lee, E., Kim, B., Kang, S., Kang, B., Jang, Y., & Kim, H. K. (2018). Profit optimizing churn prediction for long-term loyal customers in online games. IEEE Transactions on Games, 12(1), 41–53.‏

14. Zhang, R., Li, W., Tan, W., & Mo, T. (2017, June). Deep and shallow model for insurance churn prediction service. In 2017 IEEE International Conference on Services Computing (SCC) (pp. 346–353). IEEE.‏

15. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.

16. He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322–1328). IEEE.‏

17. Ahn, J., Hwang, J., Kim, D., Choi, H., & Kang, S. (2020). A survey on churn analysis in various business domains. IEEE Access, 8, 220816–220839.‏

18. Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).‏

19. Ge, Y., He, S., Xiong, J., & Brown, D. E. (2017, April). Customer churn analysis for a software-as-a-service company. In 2017 Systems and Information Engineering Design Symposium (SIEDS) (pp. 106–111). IEEE.‏

20. Ahmad, A. K., Jafar, A., & Aljoumaa, K. (2019). Customer churn prediction in telecom using machine learning in big data platform. Journal of Big Data, 6(1), 1–24.‏

21. Lalwani, P., Mishra, M. K., Chadha, J. S., & Sethi, P. (2022). Customer churn prediction system: a machine learning approach. Computing, 1–24.‏

22. Hadden, J., Tiwari, A., Roy, R., & Ruta, D. (2008). Churn prediction: Does technology matter?. International Journal of Industrial and Manufacturing Engineering, 2(4), 524–536.‏

23. Vafeiadis, T., Diamantaras, K. I., Sarigiannidis, G., & Chatzisavvas, K. C. (2015). A comparison of machine learning techniques for customer churn prediction. Simulation Modelling Practice and Theory, 55, 1–9.‏

24. Glady, N., Baesens, B., & Croux, C. (2009). Modeling churn using customer lifetime value. European journal of operational research, 197(1), 402–411.‏

25. Yu, X., Guo, S., Guo, J., & Huang, X. (2011). An extended support vector machine forecasting framework for customer churn in e-commerce. Expert Systems with Applications, 38(3), 1425–1430.‏

26. Buckinx, W., & Van den Poel, D. (2005). Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting. European journal of operational research, 164(1), 252–268.‏

27. Nimmagadda, S., Subramaniam, A., & Wong, M. L. (2017). Churn prediction of subscription user for a music streaming service. CS229 Lecture, Standford Univ., Standford, CA, USA, Project Rep., Fall.‏

28. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., … & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.‏

29. Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363.‏

30. Huang, B., Kechadi, M. T., & Buckley, B. (2012). Customer churn prediction in telecommunications. Expert Systems with Applications, 39(1), 1414–1425.‏

31. Pendharkar, P. C. (2009). Genetic algorithm based neural network approaches for predicting churn in cellular wireless network services. Expert Systems with Applications, 36(3), 6714–6720.‏

32. Hassouna, M., Tarhini, A., Elyas, T., & AbouTrab, M. S. (2016). Customer churn in mobile markets a comparison of techniques. arXiv preprint arXiv:1607.07792.‏

33. Runge, J., Gao, P., Garcin, F., & Faltings, B. (2014, August). Churn prediction for high-value players in casual social games. In 2014 IEEE conference on Computational Intelligence and Games (pp. 1–8). IEEE.‏

34. Verbraken, T., Verbeke, W., & Baesens, B. (2012). A novel profit maximizing metric for measuring classification performance of customer churn prediction models. IEEE transactions on knowledge and data engineering, 25(5), 961–973.‏

35. Burez, J., & Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Expert Systems with Applications, 36(3), 4626–4636.‏

36. Brandusoiu, I., & Toderean, G. (2013). Churn prediction in the telecommunications sector using support vector machines. Margin, 1(1).‏

37. Lu, J. (2002). Predicting customer churn in the telecommunications industry — An application of survival analysis modeling using SAS. SAS User Group International (SUGI27) Online Proceedings, 114, 27.‏