How to Explain and Affect Individual Decisions with ICE Curves

Examples of using Individual Conditional Expectation Plots in Credit Application and Employee Retention

Wai On
Towards Data Science

--

Now that we have a way to highlight an individual instance of interest in an ICE plot (see part 1), let’s illustrate their use in a couple of more realistic examples: Credit Application and Employee Retention. These examples show the different contexts that local explanations might be needed and allow us to test whether the design decisions made earlier enable ICE plots to help explain and affect individual decisions. The code for this article can be found on GitHub.

Example 1: Credit Application

Scenario

Photo by Andrea Piacquadio on Pexel

Ron is a retiree on a fixed-income. He recently applied for a car loan but was rejected. Rather annoyed, he is demanding an explanation for his rejection.

You are the loan specialist and are responsible for reviewing the automated decision with Ron. You are eager to help him since getting what he wants also benefits your business. You meet with Ron to review his rejection in detail and to see what advice you can provide.

Data: Loan Default

The data set for this example is the Loan Default Data available from Kaggle. The data consists of 10 features and a target output of “serious delinquency past 90 days”. We will use the target as a recommendation for loan approval. That is, if the prediction for “serious delinquency” is true then the loan is rejected, otherwise it is approved.

A Snippet of the Loan Default Dataset

Data Explorations

Per our discussion in part 1, we should first see if there are any features that are highly correlated.

Correlation Matrix of all variables

The data shows three instances of high correlation:

  • 30–59DaysLate and 90DaysLate: 0.98
  • 30–59DaysLate and 60–89DaysPastDue: 0.98
  • 90DaysLate and 60–89DaysPastDue: 0.99

In other words, the data shows that late debt payments are highly correlated between all the time frames (i.e., 30–59 days, 60–89 days, or 90 days).

In order to deal with these correlations, both grouping and elimination strategies were tried (i.e., aggregating all 3 variables into one, and dropping one or more of the highly correlated variables). However, no significant differences were found in how they affected the predictions. To simplify the interpretation of the explanation downstream, I eliminated the features with the lower correlation to the target variable which resulted in retaining the feature “30–59DaysLate”.

The Random Forest algorithm from scikit-learn was used to fit a model with these features for a prediction of loan approval (Accuracy = 0.93). The relative importance of the features are as follows.

Feature Importance for Loan Approval

Feature Importance analysis shows that the most important feature for our model is “Utilization”. This feature is defined as the “total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits”. In other words, a ratio of credit balance over credit limit. The importance of this feature is closely followed by debt ratio and monthly income. Let’s plot these features using our modified ICE plot to see what is going on at a more detailed level for our individual instance.

ICE Plots

The top 6 features visualized using our modified ICE plot are as follows. I’m using the pyCEbox package in Python for the calculation of ICE values with modifications on the visualization functions as described in part 1. The code for this can be found on GitHub.

Recall from part 1 that:

  • The thicker black line is the PD curve (recall this is the average of all the instances).
  • All other lines are ICE curves for individual instances. Each sub-plot here shows 602 individual instances from the dataset.
  • The blue line with markers is the instance of interest.
  • The yellow diamond shaped marker on the blue line shows the current feature value and the prediction for the instance we are interested in.

Let’s examine these sub-plots in turn. Reminder that a prediction of 1 is good (loan approved) and 0 is bad (loan declined).

Utilization: The ICE plot shows that there is a substantial turbulence up to a value of 1.0 of the x-axis; after which there is more volatility at a lower level of probability, between 1.0–1.1, before stabilizing. The PD curve reflects this, but shows clearly that there is a drop around a value of 1.0.

The ICE curve for Ron (in blue with round markers) follows the PD curve pattern more or less up to a value of 1.0 in the x-axis, after which there is a precipitous drop that is much lower than the PD curve. Moreover, as shown by the yellow diamond marker, the prediction for Ron is near the bottom of the ICE curve for this feature. Since this is the most important feature, it is an important partial explanation as to why Ron’s loan application was declined.

Modified ICE Plot for the Utilization Feature

According to Ron’s ICE curve, to increase his chance of getting a loan, he could either increase or decrease his Utilization ratio. The visualization shows that decreasing this ratio looks more effective (either by decreasing his credit balance or increasing his credit limit size). In other words, if “Utilization” is reduced, the precipitous drop in the curve will be reverse immediately, whereas increasing “Utilization” will result in a more gradual increase.

DebtRatio: Apart from a brief initial flutter, the PD curve and individual ICE curve shows the prediction to be relatively constant.

MonthlyIncome: Ron’s ICE curve shows he is near the bottom of the curve. If his monthly income is to increase then his chance of getting a loan would also increase.

Ron currently has a monthly fixed income of 5,804 from his retirement. The ICE curve shows that the optimal level for Ron needs to be is above 18,000. This is a large increase from where he is currently. However, the ICE curve shows that even if Ron could increase his monthly income by a small amount (thereby shifting to the right of the yellow diamond marker in the curve), his chance of getting a loan would increase. Having discussed this with Ron, it is concluded that this is not something he could do immediately. However, the conversation did cause him to think more about taking on a part-time job.

Age: Although the curves have some interesting fluctuations, this is one of those variables that unfortunately we cannot control. However, the good news is that the ICE curve for Ron suggests that the likelihood of getting a loan remains consistent as he gets older.

OpenCreditLinesAndLoans: Both the PD curve and ICE curve are relatively flat. For Ron, there are some minor variations for a value of under 20, after which it stabilizes.

30–59DaysPastDue: This feature is defined to be the “number of times the borrower has been 30–59 days past due but no worse in the last 2 years”.

The ICE curve shows that if Ron is able to reduce his late payments then his chance of getting approved would dramatically increase. The PD curve also shows a downward trend in loan approval probability as payment past due increases.

Even though Ron has only been late once over the past two years, this nevertheless had a dramatic effect on his chances of getting a loan approval. Although this is not something that Ron could do anything about immediately, it is an important takeaway for the future.

Aggregating Correlated Variables into One Variable: 30–90 Days Past Due

Recall earlier that I also tried a strategy of combining the highly correlated variables (i.e., “30–59DaysPastDue”, “60–89DaysPastDue”, and “90DaysLate”). The result shows that the general shape of the ICE curve is similar to that of above (with a steep downward drop before shallow recovery) and corroborates with the conclusion that any delay in payments will likely increase the probability of loan getting declined.

Summary

The analysis shows that “Utilization” is the most important feature that explains Ron’s loan application rejection. To maximize his chance of getting a loan in the future, there are three things that Ron should consider:

  1. Decrease his “Utilization” ratio.
  2. If possible, increase his monthly income in the future.
  3. Make sure he is not late in debt payment in the future.

Having reviewed the visualizations and discussed this with you, Ron feels there are some concrete actions worth considering and feels much better about his loan rejection.

Example 2: Employee Retention

Scenario

Photo by loly galina on Unsplash

Sourabh is a star employee in your Research and Development team. He is 30 something years old and has been with the company for 5 years.

As his manager, in addition to rewarding him for his performance, you want to get a better picture of what you can do to ensure that he is motivated to stay with the company. You meet with your HR specialist, who uses ML models to help predict employee attrition, to discuss this issue.

The meeting did not start well. Your HR specialist tells you that according to the ML model, Sourabh is likely to leave. With urgency, you and your HR specialist examine the results closely to understand it further and to see what steps you need to take to retain Sourabh.

Data: Employee Attrition

The dataset for this example is the popular dataset from IBM on employee attrition. After removing a number of redundant features and reducing the data to only the Research and Development team, we end up with 1470 records, 24 features, and a target variable “Attrition”. Note that in contrast to the earlier example, there are substantially more features we have to deal with in our analysis and our visualizations.

A Snippet of IBM Employee Attrition Dataset

Data Explorations

There are so many features in this dataset that makes the Correlation Matrix difficult to see and examine in detail here. Nevertheless, it is possible to make out that there are a number of variables that are highly correlated.

Correlation Matrix for 25 Variables

In particular, there are two sets of features that are highly correlated:

  • MonthlyIncome and JobLevel: 0.99
  • TotalWorkingYears and JobLevel: 0.79

Not surprisingly, there is a correlation between an individual’s job level and compensation (“JobLevel” and “MonthlyIncome”) since a higher job level typically means higher monetary compensation in many companies. Similarly, it typically takes time to get promoted to a higher job level, so individuals with high job levels may also mean that they have worked in a particular company for many years (“TotalWorkingYears” and “JobLevel”).

For reasons that will be discussed later, in order to deal with these correlations, I have elected to remove one of the features (“JobLevel”) rather than combining them. As in the previous example, a Random Forest algorithm was used to create the prediction model. The relative importance of the features are as follows:

Correlation Matrix for Remaining Variables

As shown above, “MonthlyIncome” stands out as the most important feature in the resulted model. The difference between “MonthlyIncome” and the next most important feature is relatively large. After which, there is a steady decline of importance.

ICE Plots

Since there are so many features, I tried various ways of displaying as many ICE plots as possible to provide a high level view of the features. I settled on a 4 by 4 grid; the result is an iconized rendition of an ICE plot, starting with the most important feature at top and in descending order of importance going from left to right.

Composite ICE Plot for Top 16 Features

This composite ICE plot provides an effective overview of where our instance of interest is with respect to each of the top 16 features. Although some of these features have relatively low influence on the prediction, it is nevertheless helpful to see their trends. Recall that a probability approaching 1.0 means Sourabh is more likely to leave, so we want to see Sourabh’s ICE curve lower on the y axis.

Let’s examine these features one row at a time and focus on the more interesting features in detail.

Monthly Income

On the top row, most of the features are relatively flat with the exception of “MonthlyIncome”.

The PD curve for “MonthlyIncome” (the thicker black line) shows a decline to a probability of attrition around 0.15 (y-axis) at a monthly income of 2,500 (x-axis) before leveling off. Interestingly, the likelihood of attrition increases once most individuals reach a monthly income of around 12,500, suggesting that an increase in monthly income is only effective up to a certain point. After which, other factors perhaps become more important.

Sourabh’s ICE curve reflects this pattern except it is at the higher level of probability of attrition overall. His current monthly income is at 2,313, bringing this to closer to 2,500–3,000 will likely to decrease the probability of leaving. After which, the return on compensation increase is negligible for a significant stretch.

Recall that in order to deal with a high correlation between “MonthlyIncome” and “JobLevel”, we had decided earlier to remove one of the variables rather than combine them. This turns out to be the right decision. Otherwise, it would have been more difficult to know precisely the increase of monthly income required.

Distance from Home

On the second row of the composite ICE plot, two features stand out. First is the feature “DistanceFromHome”:

The PD curve suggests that “DistanceFromHome” stays relatively stable within a distance of 25 miles, beyond which, the probability of attrition starts to increase. The ICE curve shows that Sourabh is at the edge of the tolerance for distance from home. If this distance was to increase then the likelihood of leaving for Sourabh would also increase.

The fact that employees will consider leaving if their commute gets worse is not surprising. However, the benefit of the ICE plot and the highlighting of individual instances is that it allows us to predicting the effects of the change for the individual employee with a higher level of fidelity.

Stock Options Level

Another feature in this row that is of interest is “StockOptionLevel”.

Sourabh currently does not receive any stock options. Sourabh’s ICE curve shows that if the “StockOptionLevel” was to increase then the likelihood of attrition would be lower. The ICE curve suggests that an increase of “StockOptionLevel” to 1.0 is optimal since this is the lowest inflection point in Sourabh’s ICE curve.

On the third row, there doesn’t appear to be anything that is worth adjusting. Sourabh has been with you, his manager, since he joined the company, and the model suggests that this is a good thing from an attrition point of view. It also looks like that getting a promotion (perhaps in job level only) doesn’t appear to affect attrition likelihood.

Finally, as we move to lower importance features in the fourth row, the curves on these plots are relatively flat. Like other plots, Sourabh’s ICE curve is above the PD curve and near the top of the stack with respect to the rest of the population, supporting the hypothesis that he is likely to leave given his current circumstance.

Summary

Your conclusion from meeting with your HR specialist is that you need to take remedial actions as soon as possible. The analysis shows the following should be considered:

  1. An increase in “MonthlyIncome” to the 2,500–3,000 range.
  2. Provide stock options to a “StockOptionLevel” of 1.0.

A further factor to consider for the future is that if the work location is to increase with respect to homes of employees, then the likelihood of attrition for employees like Sourabh will be affected.

At your request, your HR specialist runs the prediction again with new numbers. You find that an adjustment on “MonthlyIncome” (to 3,000) brings the probability of attrition to 0.27. Moreover, making an additional adjustment of increasing “StockOptionLevel” (to 1.0) brings the probability of attrition even lower, to 0.22.

The analysis has provided you with some concrete ideas of the parameters to work with. You come away from your meeting more optimistic than you began. To the extent that your budget allows, you are eager to have a conversation with Sourabh and take the appropriate remedial actions at the earliest opportunity.

Discussion and Conclusion

The goal of this article was to explore the potential of using ICE plots to help explain and affect individual decisions. In terms of the design decisions that we have made to support this, I’ve found that:

  • The combination of highlighting the instance of interest (via a blue ICE curve with round markers) and highlighting its current feature value (via a yellow diamond marker) allows us to see clearly the current prediction and how it may change with varying feature values.
  • A composite iconic visualization of ICE plots provides a useful high level view of how different values of the most important features influence the predicted outcome. This visualization helps us to identify which features we should focus on to be effective in affecting change; typically, where the current feature value is near a noticeable change of direction on the curve. Conversely, if the ICE curve is relatively flat, then it suggests there is unlikely to be any significant change in prediction by varying the value of that feature.
  • Highlighting an individual instance in the context of other instances and the PD curve does not cause unmanageable complexity. Recall from part 1 that visual complexity is an area of uncertainty that we wanted to test using more realistic examples. The analysis here shows that the design decisions made earlier were appropriate. The remaining instances in the ICE plot provide a contrast where we can see where our instance lies in the population. The PD curve also serves as a reference to see if the current instance is deviating or conforming to the average of the dataset.
    It is important to point out, however, that this visual clarity may be due to the fact that the PD curve and ICE curve in our examples do not overlap (with the exception of the “Utilization” feature in the Credit Application example). Otherwise, the detailed differences between the two curves will likely be more difficult to discern. In addition to exploring the ordering of the overlaps and transparency levels of the curves, a possible enhancement for the future could be to provide a way to switch on and off these curves interactively. That would allow a better discrimination of the curves within a plot where there are significant overlaps.

With respect to the analysis process, as shown in the examples above, the interpretation of ICE plots require careful navigation. In particular, we need to be mindful that:

  • Interpretation and inference of potential actions require general knowledge about the characteristics of the features. In the Credit Application example, knowing that we are able control some features (e.g., “Utilization”) but not others (e.g., “Age”), allows us to focus on the features that we can do something about. In general, thinking carefully about the characteristics of a feature is critical when determining what potential actions to take, without which we could end up making erroneous inferences.
  • Interpretation of highlighted ICE Curves may require specialized domain knowledge. Most people are able to grasp domains of such as credit applications and personnel retention without great difficulty. However, if the features involved are in an area that requires extensive learning and training, then domain experts may be needed to interpret the result and to infer what potential actions can be taken. Moreover, the task of explaining the predicted outcome to the individual affected, as depicted in our scenarios, may also be more difficult.
Photo by Hans Reniers on Unsplash
  • Feature Engineering and figuring out the best way to deal with highly correlated features are critical when using PDPs and ICE plots. In the Employee Retention example, the potential difficulty with attribution means we have to be judicious about which feature to drop and which to keep. In many cases, a trial and error approach of testing different methods for dealing with highly correlated features may be necessary.
    It is also worth pointing out that in this article we have only used simple methods for detecting and dealing with linear relationships. Suffice to say that for more complex relationships (e.g., non-linear correlations), a wider array of techniques may be needed to deal with them. If teasing out correlations between features become unmanageable, we may have to consider using alternative local explanation visualization techniques. For example, Accumulated Local Effects¹ (ALE) plots avoid the correlation problem and support for their creation is becoming more prevalent. In any case, the visualization approach shown here, that of highlighting the individual instance of interest, should be beneficial for explaining and affecting individual decisions in this class of explanation tools.

In summary, the exploration has shown that when used with care, a modified version of the ICE Plot can be tremendously helpful when trying to explain individual decisions as well as help us understand what is needed to affect them.

Visualizations are important tools for providing explanations to ML predictions and making black box models more transparent. As such, this is an area that deserves more attention. With the findings from this exploration in mind, the ICE Plot with the appropriate visualization can be a valuable addition to the ML individual decision analysis toolkit.

References

[1] Apley, D.W. and Zhu, J. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. arXiv:1612.08468 (2016)

--

--

AI Researcher, UI Designer, Psychologist, User Researcher, Plastics Climber, and follower of the Bourdain trail.