Analyzing and Improving a Watson Assistant Solution Part 2: Foundational Components of Watson Assistant analysis

Andrew R. Freed
IBM Data Science in Practice
6 min readMar 11, 2020

--

Search for key insights in your log data. Photo by Darwin Vegher on Unsplash

In Part 1 of this series we explored the personas who analyze virtual assistants and some existing tools that help them. In this post we will review the foundational components in building your own analysis pipeline. Building your own analysis lets you fully customize an analysis to both your specific personas and your specific virtual assistant.

There are four key steps in developing Watson Assistant log analysis:

· Gather the logs

· Extract the fields of interest

· Define the analytic goal

· Filter data for analytic goal

In the remainder of this blog post we will explore these steps in detail. Part 3 of this series will demonstrate recipes for common analytic goals.

Gather the logs

The first step is to identify where your log data is located. Depending on your application architecture you may be storing full or partial log data on-premises or in your own cloud data store. Watson Assistant also stores logs with a retention policy that varies based on your service plan.

Watson Assistant exposes an API to get logs. In the logs API each “log” is a request-response pair and we will explore later how to correlate these logs into conversations. The log API requires query parameters (such as a date range) and returns paginated results. If you are analyzing large data sets you will want to build a wrapper to handle the pagination and rate-limiting enforced by the API. Python users can reference the getAllLogs script from WA-Testing-Tool.

The log API has a rich query syntax allowing you to specify exactly which logs you are interested in. The filter I use the most is to use a response_timestamp range (response_timestamp>=2017–07–01,response_timestamp<2017–08–01) to get all events in a given time period.

I recommend downloading and storing the full “raw” (JSON) log events until you are practiced enough at analytics to understand which fields you need. It is much easier to grab more fields than you need and filter them later than to grab less fields than you need! In the following section I will list the most commonly used log fields.

Extract fields of interest

The logs API documents each of the fields in the “Log” object including the “MessageRequest” and “MessageResponse” child objects and their descendent objects as well. A myriad of fields are available but I will document here the most useful fields for analysis and why you should use them. Fields starting with “request.” originate from the user and those with “response.” originate from the Watson Assistant. Restricting the fields analyzed greatly reduces the amount of data stored in memory while running analysis.

· request.context.conversation_id: You need a unique identifier to correlate log events into a conversation to perform conversation-level analytics. The conversation_id is unique to a user session with a single dialog skill. If your assistant uses multiple dialog skills to service a single conversation you must use a different field as the conversation correlator. (In most multi-dialog-skill assistants, an orchestration layer provides this correlation ID and stores it in the request.context object).

· request_timestamp and response_timestamp: These fields allow you to sort and filter events by time. In multi-skill assistants these are the best sort fields within a conversation.

· response.context.system.dialog_turn_counter: For single skill assistants this is a convenient key for referencing the nth turn of a conversation. The code referenced by this blog series updates the dialog_turn_counter for use in multi-skill assistants as well.

· request.input.text: What the user actually typed (or spoke) to the assistant. This is used in most turn-based analytics including gathering user utterances to create new blind sets and ground truth.

· response.output.text: What the system responded to the user with. While not strictly useful for analytics it does help analysts follow the flow of the conversation. The simplest manual analysis of a conversation is to print all of the request.input.text and response.output.text pairs sorted by request_timestamp.

· response.intents[0].intent and response.intents[0].confidence: These represent the top intent selected by the classifier and its associated confidence score. Although the dialog skill may not use the intents in every dialog node these fields are critical for the intent analyst and useful to the other analytical personas. These fields may be used in bootstrapping new ground truth.

· response.entities: Entities are frequently used to augment intents in satisfying a user goal. Analyzing the entities helps the intent analyst refine conversational branching and provide insight to the business analysts as well.

· response.output.nodes_visited: This field lists all dialog nodes executed by Watson Assistant in delivering the response to the user. Whenever a dialog skill uses “Jump to Node”, “Skip and Evaluate Child Nodes”, and “multi-condition responses” it will have multiple entries in nodes_visited. The intent analyst can use this field to identify how the user responds (input.text) after a given dialog node is visited. The code referenced in this blog series also creates a “prev_nodes_visited” field which aligns the nodes_visited after the previous message into the same row as the current message.

· workspace_id: Particularly useful for multi-skill assistants, this field tells you which skill received the user request.

· Application-specific context variables: You can store arbitrary data in the Watson Assistant context object. This data will be stored in request.context by an orchestration layer or in response.context if generated by Watson Assistant. These fields will be useful to the business analyst.

· request.context.vgwSessionID (for Voice Gateway users): In speech applications the Voice Gateway Session ID is a superior correlation ID for conversations since it is unique per conversation and it can also be used in evaluating Voice Gateway logs.

· Speech Confidence (for Speech to Text users): In speech applications the transcription confidence from Speech to Text is useful for evaluating Watson Assistant responses. I recommend having your orchestration layer store the “results[0].alternatives[0].confidence” value from Speech to text in a “request.context.STT_CONFIDENCE” field in Watson Assistant. This is critical for the speech analyst.

Extracting these fields from a JSON log event is a relatively simple parsing exercise. Python users can reference the extractConversations script from WA-Testing-Tool which parses JSON log data and builds both a Pandas dataframe and a CSV file containing only the fields of interest.

Define the analytic goal

As discussed in Part 1 each persona has different areas of focus. The intent analyst wants to find the user responses to open-ended dialog questions. The intent analyst wants to see which nodes generate the most unexpected responses. The business analyst wants to see how many calls escalate.

The fields of interest in the preceding section can be combined in recipes supporting each of these goals and more. Analysis works best when you know what your analysis target is however I will discuss some common patterns below to get you started.

Filter data for analytic goal

Depending on the analytic goal you may need to apply a “conversation-aware” filter to the log events to organize them by conversation or you can stay conversation-agnostic.

For instance “get all responses to dialog node X” is conversation-agnostic while “get all first responses to dialog node X” is conversation-aware (a user may visit the same dialog node multiple times in a conversation). Many “counting” analytics can be done in either way: “how many escalations occurred” can be answered in conversation-agnostic manner by counting number of events with the escalation node in the nodes_visited list, but a conversation-aware filter can also tell you what happened before or after the escalation. Begin with conversation-agnostic analytics where possible as those are slightly easier to implement.

In the next post I will demonstrate recipes serving common analytic patterns.

Thanks to the following reviewers of this post: Eric Wayne, Aishwarya Hariharan, Audrey Holloman, Mohammad Gorji-Sefidmazgi, and Daniel Zyska.

For more help in analyzing and improving your Watson Assistant reach out to IBM Data and AI Expert Labs and Learning.

--

--

Andrew R. Freed
IBM Data Science in Practice

Technical lead in IBM Watson. Author: Conversational AI (manning.com, 2021). All views are only my own.