Techniques for explanation retrieval with Watson OpenScale

Published in

IBM Data Science in Practice

8 min readOct 26, 2022

Co-authored by Courtney Branson, and Rakshith Dasenahalli Lingaraju.

Watson OpenScale is a tool that monitors AI models to determine if a model is performing in a reliable, transparent, and expected manner. One facet of its monitoring capabilities is called ‘Explainability’. Historically, AI models have been a black box. Information is given to the model and a result comes out, but the inner workings of the model are never truly seen or understood. While this may have been acceptable in the past, we now hold our models to a higher standard. If humans cannot comprehend how the model came to its decision, then the results are considered unreliable. Hence model transparency is necessary, especially for government and public service industries as their AI applications can substantially impact someone’s life.

In this article, we will walk you through four different methods you can use to access explanations with Watson OpenScale. As an example, we will look at a credit risk use case. Specifically, we will generate explanations for a machine learning model that predicts whether a bank member is at risk of defaulting on their loan. The predictions of this model have the power to greatly impact these bank member’s lives, so it is very important for everyone involved to be able to clearly understand the reasons behind the model prediction. To get us started, we have already created this model, deployed it, and have set up Watson OpenScale for continuous monitoring on Cloud Pak for Data v4.5, IBM’s Data & AI platform.

Accessing Explanations Through the UI

The first technique we will cover is the simplest and the only ‘no-code’ solution. It simply involves pulling up the OpenScale dashboard and navigating to the explanations page using the second button on the left hand navigation bar.

List of all transactions scored on the credit risk pipeline

Here you can see all of the transactions scored by this model along with their predictions and confidence scores. To view the explanation for how the model came to its decision for any of the transactions, you can click on the ‘Explain’ button on the far right. If the explanation hasn’t already been generated through other means such as the API, clicking this button will trigger the explanation generation and will navigate you to the screen where the explanation can be viewed. The explanations are generated using LIME. At a high level, LIME attempts to explain the prediction by looking at and quantifying the influence of individual features of the dataset on the prediction made.

LIME explanation of the models predicted outcome for a single bank member

In the figure above, you can see the features in blue (over the relative weight % of 0) which are contributing positively towards the predicted outcome as well as the features in purple (below the relative weight % of 0) which are contributing negatively. You can also hover over the bars to see the value of the feature in question.

Contrastive explanations can also be generated by navigating to the Inspect tab and clicking on ‘Run analysis’. Contrastive explanations look for the minimum number of features that need to be changed — and by how much — to get a different predicted outcome.

In this case, if the bank member’s checking status had been 0_to_200 instead of no_checking, they would have been labeled as ‘No Risk’ instead of ‘Risk’.

Using the UI to generate and retrieve explanations has many benefits. The primary being how quickly you can get started and its ease of use. However, there are many limitations as well. Mainly, that you are limited within the IBM Cloud Pak for Data (CPD) platform and it does not allow for any customization.

The following 3 methods are advantageous if you want to access the explanations outside the CPD platform, say to customize it and integrate it with an application.

Accessing explanations through the API method

An API is a set of definitions and protocols for building and integrating an application software. It acts as an interface that facilitates communication between two platforms. Watson OpenScale offers a REST API to allow outside applications to retrieve the data generated by it.

The first API endpoint we want to highlight allows for generation of a new explanation given a transaction id.

data_mart_id = '<your datamart id>'
subscription_id = '<your subscription id>'
transaction_id = '<your transaction id>'parms = {}
parms['subscription_id'] = subscription_id
parms['scoring_id'] = transaction_idheaders = {}
headers["Accept"] = "application/json"
headers["Authorization"] = "Bearer " + tokendata = {}
data['scoring_ids'] = [transaction_id]
data['explanation_types'] = ['lime', 'contrastive']generate_url = <your cpd url> + '/openscale/' + data_mart_id + 'v2/explanation_tasks'gen_resp = requests.post(generate_url, headers = headers, json = data, verify = False)gen_json = gen_resp.json()

If you already have an explanation generated and you would like to simply access it, you can use the following two API endpoints. The first retrieves the required unique id to identify which explanation you are trying to retrieve. The second uses that unique id to retrieve the explanation. It’s important to note here that the unique id required here, which we save as a variable called explanation_task_id, is not the same as the transaction_id.

url = <your cpd url> + '/openscale/' + data_mart_id + '/v2/explanation_tasks'
resp =  requests.get(url, headers = headers,  params = parms, verify=False)resp_json = resp.json()
explanation_task_id = resp_json['explanation_values'][0][0]expl_url = <your cpd url> + '/openscale/' + data_mart_id + '/v2/explanation_tasks/' + explanation_task_id
response = requests.get(expl_url, headers = headers,  params = parms, verify=False)response_json = response.json()

The explanation retrieved will be a JSON, which can then be used to access both the LIME and contrastive explanations by selecting the desired index. Here, we show how to retrieve the section of the json relating to contrastive explanations and save it as con_expl_api. We also show an example of how you can access and display LIME explanations using pandas.

con_expl_api = response_json['entity']['explanations'][1]lime_explanation_api = pd.DataFrame.from_dict(response_json['entity']['explanations'][0]['predictions'][0]['explanation_features'])lime_explanation_api = lime_explanation_api.reindex(lime_explanation_api.weight.abs().sort_values().index)

Using the API method requires authentication as well as, at a minimum, two API calls. Using APIs also requires you to have some knowledge of the system, as well an ability to code. In addition, any customization needs code to be written and tested before use.

Accessing Explanations Through the Python SDK

A Software Development Kit (SDK) is a set of pre-built software building tools for a specific platform. The Watson OpenScale python SDK is easily installed and allows for access to similar if not identical methods as the API. The SDK for this purpose is called ibm_watson_openscale.

The advantage of a SDK is that it abstracts away the complexity of dealing with an API, while still using the API on the backend to interact with the Watson OpenScale instance. This makes development very simple, fast, and easy to learn.

Once authenticated and connection to the Watson OpenScale client is set, you can begin to interact with the Watson OpenScale system. To trigger an explanation, you can use the following code.

authenticator = BearerTokenAuthenticator(bearer_token='<your authentication token>')wos_client = ibm_watson_openscale.APIClient(authenticator=authenticator, service_url=<your cpd url>, service_instance_id=data_mart_id)explanation_types = ["lime", "contrastive"]expl_task_trigger = wos_client.monitor_instances.explanation_tasks(scoring_ids = [transaction_id], explanation_types = explanation_types).result

Once the explanation is generated, you can extract the explanation at any time using the code below.

expl_tasks = wos_client.monitor_instances.get_all_explaination_tasks(subscription_id = subscription_id, scoring_ids=[transaction_id])explanation_task_id = expl_tasks.result.explanation_values[0][0]explanation_detail = wos_client.monitor_instances.get_explanation_tasks(explanation_task_id = explanation_task_id, subscription_id = subscription_id)lime_explanation_sdk = pd.DataFrame.from_dict(explanation_detail.result.entity.explanations[0]['predictions'][0]['explanation_features'])lime_explanation_sdk = lime_explanation_sdk.reindex(lime_explanation_sdk.weight.abs().sort_values().index)con_expl_sdk = explanation_detail.result.entity.explanations[1]

The ease of using the python SDK instead of the API comes at the cost of speed. Since the SDK is simply a wrapper around the API, it does take slightly more time than calling the API directly.

Accessing Explanations Through SQL

If speed of accessing an explanation is of utmost importance, then having SQL access explanations is the best method to use. This method is only possible if you have proper credentials to your Db2 instance. You should verify this before attempting to use this technique. We have written our example code here in Python, but you can use whatever language you are most comfortable with.

import sqlalchemy as saschema_name = '<your schema name>'table = 'ExplanationsV2'db2_creds = {}
db2_creds['username'] = '<your db2 username>'
db2_creds['password'] = '<your db2 password>'
db2_creds['host'] = '<your db2 host>'
db2_creds['port'] = '<your db2 port associated with your host>'
db2_creds['database'] = '<your db2 database>'connxn_string = f"db2+ibm_db://{db2_creds['username']}:{db2_creds['password']}@{db2_creds['host']}:{db2_creds['port']}/{db2_creds['database']}"engine = sa.create_engine(connxn_string)query = 'select * from "{schema_name}"."{table}" where "scoring_id" = \'{transaction_id}\''.format(schema_name = schema_name, table = table, transaction_id = transaction_id)expl_data = pd.read_sql(query, con=engine)expl = expl_data["explanation"][0]explain_run = json.loads(str(expl, encoding = 'utf-8'))contrastive_explanation = explain_run['entity']['contrastive_explanation']['pertinent_negative_features']lime_explanation = pd.DataFrame.from_dict(explain_run['entity']['predictions'][0]['explanation_features'])lime_explanation = lime_explanation.reindex(lime_explanation.weight.abs().sort_values().index)

Once you run the code, it queries the database for the given transaction id. Once this transaction is found, an encoded version of it is returned as a JSON. When it is decoded you are able to see all the explanation information including information for both LIME and contrastive explanations assuming both have been previously generated.

One major advantage to using SQL for explanation retrieval is that it is extremely fast, sometimes 5–10x faster than the other methods covered in this article. It also lets you search for a transaction across multiple subscriptions at the same time, which is not possible with the other methods. This can be very useful if you are trying to compare a challenger and champion model as you can see both of their explanations with a single SQL query.

The overwhelming disadvantage is users have direct access to the data which can be not only deleted, but even manipulated, causing a major security concern. This method is also not officially supported. If anything breaks, support will not be given to resolve the issue. Finally, another drawback is that this method only allows you to access an explanation if it is already generated. If no explanation exists for a given transaction, there is no SQL method to trigger the explanation generation.

Conclusion

In conclusion, we understand that everyone’s requirements are different. What is best for one system or use case may not work for another. We hope that by laying out the pros and cons as well as providing sample code, it becomes easier to see which option for explanation retrieval will work best for you.

Thank you for reading and please leave your feedback and questions in the comments!