Utilizing Alteryx’s Machine Learning Tools To Reduce Churn

Combat Customer Churn with Alteryx

Published in

CodeX

5 min readFeb 4, 2024

Alteryx enables organizations to automate analytics, improve revenue performance, and manage costs. Its solution offers a range of features and capabilities, including the ability to create advanced techniques in analysis, geospatial analysis, automated machine learning, and embedded AI. These features are designed to empower organizations with automated data preparation, AI-powered analytics, and approachable machine learning, all with embedded governance and security. In this article, I will discuss Alteryx’s customer churn capabilities.

The Data Set

The dataset used as input for this model is the list of bank credit card customers from Kaggle, which contains the attrition flag of whether they churned or are still an existing customer. We are going to apply logistic regression to this dataset to get the probability of a customer churning. Knowing this is crucial for a business so they can proactively provide better services to proactively change a customer’s churn decision. It also allows companies to create loyalty programs and increase retention rates. Keeping customers costs less and delivers more value compared to acquiring new ones.

Data Preparation

To start, the data needs to be prepared. Since the customer data is in a csv, we need to convert the default string to a numeric data type. To do so, we used 2 multi field formulas, 1 for converting to INT32, and another for converting to DOUBLE. The strings with no decimal values were converted on the first multi-field formula tool using the expression “tonumber([_CurrentField_])”. Same was also done for the 2nd copy of the tool, but for strings that have a decimal value.

Figure 1: Converting the default string to a numeric data type. Image created by author.

Before the next prep step, we add a field summary tool to get the number of values for the categorical variables. Since there are only a few, we will just use a formula tool to apply one hot encoding.

Next, we need to apply one hot encoding to categorical variables. One hot encoding is used in machine learning models to convert categorical variables to numerical value. This allows the use of categorical fields even in models that require numerical input, improves model performance, and helps avoid the problem of ordinality. In this formula tool, we assigned a unique column per value. The IIF statement is used to check the contents of the original field, if it contains the specific value, it will be tagged as “1” and “0” if not. This is repeated for the rest of the values. Aside from the one hot encoding, we also simplified the attrition flag to “Churn” and will contain 1 if the customer churned or 0 if its still an existing customer.

Then we removed the unnecessary fields by adding a Select Tool. Here we unchecked the columns that were now applied with one hot encoding, as well as other columns that will not be used as predictor variables or features for the model.

Figure 2: Removing the unnecessary fields by adding a Select Tool. Image created by author.

Next, we can build the model. We’ve used the logistic regression tool under the predictive tool palette. Logistic Regression Tool creates a model that relates a binary target variable (like True or False) to one or more predictor variables and obtains the estimated probability for each of the two possible outcomes. Logistic Regression is different from other types of regression because it creates predictions within a range of 0–1 and it does not assume that the predictor variables have a constant marginal effect on the target variable, making it appropriate for dichotomous target variables (one that has only two possible values).

In configuring the tool, we need to type in a model name. In this example we’ve set it to “Logistic_Regression_Churn” which identifies the model when used on other tools. Next, we’ve set the target variable to “Churn”, since this will be the data to be predicted. For the predictor variables, we are going to select all the remaining fields, from customer age to avg_utilization_ratio. Further configuration can also be done by clicking the customize button.

In this window, we can set additional model settings regarding sampling weight, regularized regression, and model type. The 2nd tab lets you use cross validation and set a specific number of folds and trials. The final tab sets the graph resolution. Connect a browse tool for each output anchor then run the workflow. We will focus on the I anchor which has the interactive dashboard. The first report tab on the output contains the summary and shows us the accuracy as well as the confusion matrix of the model. The 2nd tab shows the conditional-density plot by predictor variable, and the 3rd tab shows the overall performance of the model by using an ROC Chart and a Precision vs Recall chart.

Figure 3: Adding additional model settings regarding sampling weight, regularized regression, and model type. Image created by author.

To insert the columns that show the probability of Churn for each customer, we need to append a Score Tool. The score tool generates a predicted value (score) from a separate data stream. We’ve configured the tool to get the score for a local model, and typed “Bank_Churner” as the field name for the target. After running the model, we now have 2 fields appended; Bank_Churner_0 for their likelihood to stick to the credit card service, and Bank_Churner_1 for their likelihood to churn.

Figure 4: Adding the Score Tool, which generates a predicted value (score) from a separate data stream. Image created by author.

Here are the results in Excel:

Figure 5: High-level view of Excel spreadsheet. Image created by Author.

Figure 6: Drilldown into the first few rows. Image created by Author.

Figure 7: Excel chart revealing the churn scores. Image created by Author.

Conclusion

There you have it, a pretty simple process, which will, hopefully, help you keep your customer from taking their business elsewhere. Alteryx is a powerful tool for your analytics needs and its AI and ML journey is just getting started.

Check out some of my other articles:

How AI Uncovers Bad Marketplace Sellers

Introduction

medium.com

How the Right Chart Tells the Right Data Visualization Story

Using the right chart means the difference between data clarity and data confusion (1st part in a two-part series)

andrewwpearson.medium.com

The 15 Principles of Data Visualization

“Having all the information in the world at our fingertips doesn’t make it easier to communicate: it makes it harder,”…