Achieving User Trust for AI/ML Solutions

Published in

IBM Data Science in Practice

4 min readJul 2, 2019

To receive the expected value from AI/ML solutions you’ll first need to gain the user’s trust. You only achieve value with adoption and you only achieve adoption through trust.

The challenge of gaining user trust applies not just to AI and ML, but to traditional analytics as well. Most of us can recall meetings where half the time is spent debating the accuracy of a number. In fact, defusing such debates is the best way to gain trust with traditional analytics: By providing a single source of truth that everyone uses to manage the business. But that alone will not lead to trust in AI/ML. With AI/ML the users need to trust more than the data, they need to trust the algorithms and their outputs.

Consider these three stages that build user trust in your AI/ML solutions:

Stage 1: Quality

Confronted with a new model, a user’s first question is, “How accurate is the model?”. Accuracy is a standard way to measure the performance of a model, but it is not always the best way to measure the model itself. How to assess a model depends on the type of model (binary classification, regression, multi-classification, etc.) and the goal of the model. What users really want to know is the quality of the model for their particular purpose. For example, accuracy isn’t likely the best way to assess a binary classification model to identify clients at risk of churn. If the users will only act on the specific clients who are predicted to churn then the target prediction is ‘at risk’. Let’s assume users won’t take any action on a prediction of ‘no risk’.

Users either want to be efficient in taking action on risk — that is, they want precision. Or they want to be thorough enough to not miss any risk — that is, they want recall. If they want a balance of precision and recall, they can try to optimize the harmonic mean of precision and recall, also known as the F1 score.

There isn’t a single quality metric that will work for every business goal, so you’ll need access to multiple quality metrics that you can use as appropriate. Out of the box, IBM Watson OpenScale provides nine different quality metrics as well as the ability to include your own custom metrics. Helpfully, OpenScale displays only those metrics that apply to the specific type of model. These metrics provide the information you’ll need to collaborate with users to determine which quality metric works best for the specific business goal.

Stage 2: Explanation

Convincing users of a model’s quality for their purposes is just the beginning of building their trust. They’ll be open to reviewing the output of the model, but they probably won’t yet be ready to adopt it. They’ll first ask you to explain how the model arrived at each recommendation or prediction. For most models, that’s not an easy task.

Thankfully, IBM Watson OpenScale helps by providing two kinds of explanation. First is a list of model features with the weightings that contributed positively or negatively to the prediction. In addition, contrastive explanations will provide the minimum feature value changes to change the prediction and the maximum changes that will maintain the same prediction. Together, these offer an unprecedented level of transparency. You can learn more about this functionality in the ‘Explaining AI Model Behavior with IBM Watson OpenScale’ blog.

Stage 3: Bias Detection & Removal

In many circumstances, driven either by regulation or by plain good business sense, users need to trust that a model isn’t biased. We train models with data and that data can sometimes contain inherent biases. Obviously we want our AI/ML solutions to mitigate our human bias, not perpetuate it. Certain model features like gender, age, and ethnicity are obvious areas to protect against bias, others are less obvious such as strong correlations between ethnicity and zip codes or between gender and names.

Detecting bias isn’t important only for vulnerable communities. Consider the example from the Masters golf tournament where IBM used AI to correct the over-emphasis on the most popular golfers in order to highlight the most impressive golf: ‘The Masters exceptional AI highlights: A round in 3 minutes’. IBM Watson OpenScale can detect bias at runtime by monitoring the data sent to the model and the model’s output. For more information about bias detection, check out the ‘Bias Detection in IBM Watson OpenScale’ blog.

Knowing bias exists is essential however user trust will only be achieved when users know the results can be bias free. In addition to detecting bias, OpenScale can also automatically de-bias the results. To ensure the users trust the de-biased results, users can view passive de-biased results before turning on active de-bias which sends the de-biased results to the users on the fly. For details, see ‘De-Biasing in IBM Watson OpenScale’.

These three stages are a continuous cycle, building and earning trust with every transaction. Only one solution available today can help you achieve and maintain all three stages of user trust for your AI/ML solution, IBM Watson OpenScale.

Achieving User Trust for AI/ML Solutions

Stage 1: Quality

Stage 2: Explanation

Stage 3: Bias Detection & Removal

Written by Chad Marston