Algorithm is Free; But, Nothing’s Free, After All

Sidney @HEARTCOUNT
HEARTCOUNT
Published in
3 min readApr 16, 2018

Machine Learning (or Statistical) algorithm is free in a sense that it is a commodity anyone can freely use to train a model. However, how much one can achieve from the algorithm (and a model the algorithm has trained) is largely dependent on a few factors such as

  • the right choice of algorithm for a given problem (prediction vs. explanation problem; regression vs. classification problem)
  • (hyper-)parameter optimization (trade-off exists between prediction accuracy and model interpretability; not in scope of discussion for this article)
  • interpretability of a trained model (intuitive model summary and visualization for average business user’s interpretation and practical application of the pattern)

Given Problem: How to Minimize Employee Churn

Let’s assume that we want to understand patterns(structure) of early employee attrition(turn-over before 90 days) so that we can design and execute more targeted retention campaign.

Or, if we translate the above problem statement as more machine-friendly data problem definition, the statement can be re-phrased as “how to classify between [class A: employee with tenure < 90 days] and [class B: employee with tenure ≥ 90 days] using human interpretable algorithms.”

Choice of Algorithm: Decision Tree Algorithm for Interpretable Classification Model

Logistic regression is one of most popular interpretable classification algorithms. However, logistic regression models fail, just like linear regression model , in situations where the relationship between features(salary) and target variable(has employee left? yes/no) is non-linear or where the features are interacting with each other, which is quite norm in real world situation. To make matters worse, both logistic regression algorithm and model are difficult to understand(interpret) and visualize.

Decision Tree algorithm to the rescue!

Decision tree models can learn the non-linear pattern and interactions among features. The tree model basically splits the data such that subsets of the dataset are as homogeneous as possible. In case of employee churn analysis, tree models try to create the subsets such that the majority members(records) of each subset are mainly comprised of a single class (i.e., either [class A: employee who has left] or [class B: employee who has NOT left]). By making each subset as much homogeneous(pure) as possible in terms of target variable class (yes or no), the trained tree model becomes more accurate.

Tree Model Interpretation: IBM Watson Analytics

Following is the classification rule-set created by IBM Watson Analytics for sample employee churn dataset. The first rule indicates that 96% of employees meeting the rules (total 464 employees) have left the company

click here to down load the data set

The visualization of IBM watson’s tree model is as follows:

the interpretation of decision tree visualization is simple (although IBM watson makes it rather difficult because of rather deeper tree which does not fit into screen real estate): Starting from the root node you go to the next nodes and you build-up churn prediction rule-set by combining each split rule using ‘AND’ logic. (e.g., satisfaction_level <= 0.4 & number_project > 5 & average_montly_hours > 253 & salary = low)

Tree Model Interpretation: HEARTCOUNT

The major problem of IBM Watson (or any other commercial analytics product which claims to be of self-service) are:

  • Very difficult to visually discover and understand the pattern because the model summary and visual representation lacks clarity and interactivity
  • In other words, usability s*cks.

Here is how HeartCount does the same thing differently using the same algorithm.

Algorithm is Free, but Model Interpretability and Usability depends on how you optimize (for clarity) the algorithm and visualize the model (for intuitive visual discovery) in an effort to augment human decision-making.

--

--