Explaining your Watson Assistant chatbot using Watson OpenScale

Published in

IBM Data Science in Practice

7 min readApr 17, 2020

A core artefact of Watson Assistant is an intent which according to the Watson Assistant documentation — “represents the purpose of a user’s input, such as a question about business locations or a bill payment. You define an intent for each type of user request you want your application to support”. e.g. The intent for the input “I would like to speak to someone” will be “Asking for Supervisor”. Another one, “Send me the location details to your nearest store” could be “Asking for Location”. These user-defined intents in your Watson Assistant application decide, how your chatbot will branch into a conversation.

The purpose of this post is to help a Watson Assistant user understand why a specific intent was selected by Watson Assistant for the customer input with the help of Explainability feature in Watson OpenScale.

Before we begin, a few points to note:

For every transaction that needs to be explained, Watson OpenScale creates 5000 perturbations around that transaction to explain it. In our case, this will mean OpenScale will create 5000 variations of a single customer input, see how Watson Assistant classifies intent on each one of them, and then give out an explanation
It is recommended that you have a Standard plan or above for Watson Assistant as the Lite and Plus Trial plans have monthly limits on the number of messages that one can send. Based on the number of perturbations, Watson OpenScale does, limited messages plan is not advised.
If you have an assistant dealing with more than 10 intents, then this post is not for you. Watson Assistant currently sends the confidence score for only the top 10 intents. However, Watson OpenScale needs the underlying application to send confidence/probabilities for all the intents/labels/classes that it is classifying. There are ways to tackle more than 10 intents in an assistant, however, for this post, we will focus on the simpler scenario. Please write me at prempiyush@in.ibm.com if you want to use Watson OpenScale and have an assistant with more than 10 intents.

Dataset Used

There is a great dataset on Kaggle — Amazon Reviews: Unlocked Mobile Phones. We are using only two columns of this dataset — Reviews, Ratings. The Reviews will act as the customer input in our training data. Before importing the training examples in Watson Assistant, we did the following transformations:

The Ratings range from 1 to 5 in the dataset. We assigned intents to these Ratings — 1 is Very_Bad, 4 is Bad, 3 is OK, 2 is Good, 1 is Very_Good.
Dropped the duplicates from Reviews column
Selected Reviews that had at least 15 words to weed out simple reviews like “Good phone.”
Selected Reviews that are less than 1024 characters as Watson Assistant has that character limit on examples.
Selected 1000 examples for each of the intents to make it a balanced classifier.

Imported the intent examples in Watson Assistant

Custom Wrapper Application

Since Watson OpenScale does not yet support Watson Assistant for monitoring purposes, we will have to write a Custom Wrapper Application that can act as a go-between for us.

Custom Wrapper will act as a go-between Watson Assistant and Watson OpenScale

This custom wrapper application can be written in any language and framework of your choice and can be deployed anywhere accessible. The only constraints that the custom wrapper needs to adhere to is the input and output formats of the API. These are explained nicely in Watson OpenScale documentation.

For our work, we chose to write this custom wrapper as a NodeJS+Express service and deploy it to IBM Cloud.

What goes in Custom Wrapper Application

Besides the usual plumbing of NodeJS+Express routes, there are a couple of things that go in our application. Based on the diagram above, these are:

A method that converts Watson OpenScale request to Watson Assistant request
A method that converts Watson Assistant response to Watson OpenScale response.

Before we dive in, as mentioned before, Watson OpenScale scores 5000 perturbations. However, Watson Assistant does not support classifying intents for multiple inputs in a single request. So, we also added a way to convert these 5000 perturbations that Watson OpenScale sends to 5000 individual requests for Watson Assistant. Since issuing these 5000 individual requests sequentially would have taken a lot of time, we used mapLimit from async to parallelise our requests with a limit of 500 requests at a time. We used ibm-watson npm package to talk to our Watson Assistant instance. The details on how to use it can be found in Watson Assistant documentation.

Converting Watson OpenScale Input to Watson Assistant Input

Intent classification for individual input with retries.

The individual_score method is quite simple. We take an utterance and send it to Watson Assistant to get an intent back. The alternate_intents option is set to true so Watson Assistant sends us confidence for all the 5 intents. Without this property, Watson Assistant will only send the confidence for the top intent.

As a result, this Watson OpenScale input…

{
   "fields":[
      "Reviews"
   ],
   "values":[
      [
         "Very disappointed. Screen barely responds to touch, Battery is taking forever to charge and dies very fast. Useless device."
      ]
   ]
}

… gets converted to this Watson Assistant input …

{
   "input":{
      "text":"Very disappointed. Screen barely responds to touch, Battery is taking forever to charge and dies very fast. Useless device.",
      "options":{
         "alternate_intents":true
      }
   }
}

Converting Watson Assistant Output to Watson OpenScale Output

Converting Watson Assistant Output to Watson OpenScale format

For every Watson Assistant result, the manipulate method returns an array, containing — the original input, the top intent, and an array of probabilities/confidences for each of the 5 possible intents. These confidences are not the same as the ones returned by Watson Assistant. Watson Assistant returns confidence for each intent in the range [0, 1] , however, Watson OpenScale needs their sum to be in the range [0, 1] . So, our manipulate method returns the L1 norm of these confidences by dividing each confidence by the total sum.

As a result, this Watson Assistant output …

{
   "output":{
      "generic":[
         {
            "response_type":"text",
            "text":"I didn't understand. You can try rephrasing."
         }
      ],
      "intents":[
         {
            "intent":"Very_Bad",
            "confidence":0.9443320274353029
         },
         {
            "intent":"OK",
            "confidence":0.23559726774692535
         },
         {
            "intent":"Bad",
            "confidence":0.2239454984664917
         },
         {
            "intent":"Good",
            "confidence":0.21402768790721893
         },
         {
            "intent":"Very_Good",
            "confidence":0.20079457936808467
         }
      ],
      "entities":[]
   }
}

… finally gets converted to this Watson OpenScale output …

{
   "fields":[
      "Reviews",
      "top_intent",
      "confidence"
   ],
   "values":[
      [
         "Very disappointed. Screen barely responds to touch, Battery is taking forever to charge and dies very fast. Useless device.",
         "Very_Bad",
         [
            0.12313512969153419,
            0.11768187924517681,
            0.12954178725467652,
            0.519235472319682,
            0.11040573148893042
         ]
      ]
   ]
}

Configuring Custom Wrapper Application with Watson OpenScale

The Watson OpenScale documentation describes two ways to configure your custom application. We opted to do it via the individual scoring endpoint (the endpoint that your custom application exposes for scoring) as there is only a single deployment in our application. For larger applications or applications with multiple such assistants, you will have to provide another endpoint to return a list of such deployments.

Provide your basic auth details to connect to Custom Application.

Adding the custom deployment using scoring endpoint from the Insights Dashboard

After adding the custom deployment, we need to score some sample text and add the request-response to the Payload Records of OpenScale. This step is needed so that Watson OpenScale can know more information about the custom application model outputs. This can be done via the REST API or the language of your choice. A sample notebook has been provided in the GitHub repository to do this task.

Now, we need to add more information about the input-output of the application in Watson OpenScale.

Adding the model input type and algorithm

In our case, the data type is Unstructured text and the algorithm type is Multi-class classification as there are 5 intents.

As you may remember from above, our custom application output, had "Reviews", "top_intent", "confidence" as the fields. We mark top_intent as the model prediction and confidence as the model probability output.

After the model configuration is successful

Generating Explanations!!!

Now, for the dessert, let’s generate explanations for a couple of user inputs. The sample notebook has already scored these transactions and given scoring_id for each one of these. We will use them for generating explanations.

The first input is:

Great phone available at an affordable price. Amazing performance and battery life make it an excellent choice.

A Watson OpenScale Explanation for a Very_Good intent

Watson OpenScale has generated an explanation stating.. that the topmost factors — Amazing (26.15%), affordable (16.81%), excellent (13.44%), Great (11.71%) etc. — are contributing the most towards the Very_Good intent for this input. The other two factors are contributing to the other intents.

The second input is:

Very disappointed. Screen barely responds to touch, Battery is taking forever to charge and dies very fast. Useless device.

A Watson OpenScale Explanation for a Very_Bad intent

In this case, Watson OpenScale explanation justifies the Very_Bad intent given to this user input, by listing the top factors as — Useless (21.77%), disappointed (14.65%), touch (8.16%), dies (8.12%) etc. The factor — barely (15.09%) — is contributing towards the other 4 intents.

That’s all, folks!

PS: All the code for the custom application, the sample notebooks and training examples for Watson Assistant are in this GitHub repository: prempiyush/custom-wa-wrapper.