Understanding the Watson Assistant for Voice Interaction Solution

co-authored by Andrew Pang & Leo Mazzoli

Andrew Pang
IBM watsonx Assistant
9 min readOct 29, 2019

--

Photo by Taylor Grote on Unsplash

In August 2019, IBM released a new offering called Watson Assistant for Voice Interaction(WAVI). It is positioned to be a bundled set of Watson services that would support caller inputs via voice and SMS in order for a business to convert their old fashioned Interactive Voice Response (IVR) solution to a solution that featured the conversational dialogue of a chatbot. A nice overview of the featureset is referenced here.

From my personal experience with IVRs, I welcome this new technology! I recall the countless number of times when dialing up organization’s customer call center for a service issue only to encounter a long menu of choices that would be selectable by a keypad. This usually resulted in frustration and impatience in the whole effort. By the time I was forwarded to a human operator, I may have already spent a bit of time without even addressing the reason why I called about.

The introduction of WAVI will change all of this! When users call into this solution, they will be greeted with a human-like voice which will ask them how they can help. User utterances are taken as inputs and interpreted for the best response. In the end, they will reach resolution for the issue they called for and have a much better experience.

A colleague of mine very adequately described the new solution and the components, reference his blog here to get a primer. What I want to do in this article is to delve deeper into the technology and how each of the services work together.

Diagram illustrating the Watson Assistant for Voice Interaction solution

Here’s a quick description of the components that make up the WAVI solution:

Watson Voice Gateway (VG)– Accepts incoming voice calls and SMS messages as input and brokers the conversation between the caller and the following offerings:

1. Watson Speech to Text (STT)– This offering accepts an audio transmission and returns the text translation.

2. Watson Assistant (WA)– Receives converted text and processes as a chatbot request. An appropriate response is generated and returned to the Watson Voice Gateway offering / Service Orchestration Engine.

3. Watson Text to Speech (TTS)– This offering accepts a text string and returns a converted audio stream.

4. Service Orchestration Engine (SOE) optional– Customized piece of code between the Voice Gateway and Watson Assistant. This commonly serves as a that will interface to 3rd party services and internal/back-end APIs to apply business specific requirements to the conversation data (i.e. translation, date/time formatting, sensitive data masking, etc).

Getting Started

For the initial setup, the services are bound together while setting up the Voice Agent within the Watson Voice Gateway offering. Walk through the form to configure each service. For a more detailed walkthrough of entries, take a look at this great video to get up and running. Do take notice that for binding the Watson services, the user has the convenience of selecting a pre-existing service or have the configuration create one for you.

Now that we have the initial setup complete, the solution is capable of taking voice input and processing responses. What happens when something fails along the way? How can we follow the transaction journey of a call session?

Attempted SMS dialogue with Watson Assistant — Something didn’t work!

From recent experience with our clients who have begun to replace their IVR with the WAVI solution, there are a few practices and utilities we have introduced to them to help them with the transformation.

Explore the Voice Gateway Usage Dashboard

Via the Voice Gateway offering, locate the Usage menu item on the left side bar. This dashboard provides start and end detail on every session. For sessions that have failed, an additional drill-in detail is provided to delve into each error and warning.

Watson Voice Gateway Usage Dashboard
Watson Voice Gateway Usage Dashboard — Deeper analysis on failed sessions
Watson Voice Gateway Call Log Details — sessionID referenced

Key Element(s):

sessionID – For each conversation session, this id will be generated. If the cause of the issue is not apparent in the Usage logs, this can be utilized for cross-reference for deeper tracing.

Event Forwarding

In addition to the log viewer dashboard, the Voice Gateway generates events that records the call timeline and the activity conducted within the session. The events can be stored in a noSQL database such as Cloudant.

Setup

The setup is accessed via the Manage section within the Edit Agent submenu.

Watson Voice Gateway — Managing the Voice Agents

Within the Voice Agent configuration, there will be a section to Enable Event Forwarding. Identify the details and credentials of the target noSQL database and check the types of events to store and the target database destination.

Pro-tip: IBM Cloud offers a Cloudant offering that can be provisioned from the configuration screen. By using this service, your WAVI solution can be contained in a single location.

Watson Voice Gateway / Voice Agent edit — Configuring Event Forwarding

A successful setup will begin to show events recorded as documents populated in the database.

Note: Recorded events will not pick up historical conversation sessions prior to setup. Only events after event forwarding has been enabled will be stored.

Cloudant Database Dashboard — Event Forwarding in action!

Event Types

The following details the three categories of events that is published by the Voice Gateway offering.

  • Call Detail Record (CDR)
  • Transcription
  • Turn

Note: We will touch on a few key elements that would be helpful for tracing and troubleshooting the session. For more in-depth description on all the reported metadata, see the record event format section in the Voice Gateway documentation.

1. Call Detail Record (CDR) Events

Provides specific details on the entire call session. This includes the following:

  • Start/End time of call session
  • Customer phone number
  • Services utilized
  • Watson Speech to Text/Text to Speech transformation details
  • Watson Assistant interactions.

Key Elements:

  • globalSessionID The id can be cross referenced with the SessionID from the Voice Gateway Usage logs
  • workspaceID — Identifies the Watson Assistant skill/workspace used to respond to user utterances during the call session
  • allIntents — Lists all intents responded from the Watson Assistant during the recorded session

In this illustration, two excerpts from a single CDR event are represented. Take note of the key elements highlighted as they can be used to correlate with the subsequent illustrations in this article.

Watson Voice Gateway Event Forwarding — Excerpts from a single CDR event

By referencing the globalSessionID & workspaceID, we will be able to trace through the origination point and the destination point in Watson Assistant . All responses from Watson Assistant to the user are also identified (allIntents) within this event as well.

2. Transcription Events

Publishes details for every utterance by the user and response by Watson Assistant.

Note: Each published event is a single transaction that will detail either an utterance from the caller or a response generated from Watson Assistant.

Key Elements:

  • globalSessionID — The id can be cross referenced with the SessionID from the Voice Gateway Usage logs
  • sourceType & destinationType — These elements can be referenced to determine whether the event recorded a user utterance or a Watson Assistant response. A sourceType with a value of “conversationID” indicates the event is a Watson Assistant response. Conversely, a value of “sipURI” will be a user utterance.
  • source & destination — This element provides additional details associated to the sourceType & destinationType elements. If the value of the <source/destination>Type element is “conversationID”, this element will be the value of the conversationID. Alternatively, if the <source/destination>Type is “sipURI”, this element will have the value of incoming caller id of the user.
  • transcription — String text issued by either the user utterance or Watson Assistant response.
  • conversationID — This id links the transcription events for a given session together.
  • workspaceID — This id links the transcription events for a given session together.

To illustrate, the following example shows a single caller utterance followed by a response from Watson Assistant.

Here’s what is happening behind the scenes within the following Transcription events:

Excerpt from Watson Voice Gateway Transcription — Caller utterance to Watson Assistant

Event #1 is a Caller utterance that represents the Source/SourceType originating from the caller’s phone number with a Destination/DestinationType to the Watson Assistant (identified by the ConversationID).

Excerpt from Watson Voice Gateway Transcription — Watson Assistant Intent response

Likewise, Event #2 is a response from Watson Assistant (ConversationID) going outbound back to the Caller.

The workspaceID and globalSessionID are also represented to allow for correlation back to the CDR events.

3. Turn Events

Turn events provide detail on a complete User utterance & Watson Assistant response transaction.

Examining the following event example, the user is requesting information on the Voice Gateway (Input section) and Watson Assistant replies with it’s response (Output section). The conversationID is also included for traceability.

Watson Voice Gateway Event Forwarding — Turn event

Investigate Watson Assistant Logs

Let’s now take a look at what transpires when the conversation session crosses into Watson Assistant. The log information recorded by Watson Assistant is robust representation of all transactions (caller utterances & WA responses) across the life of the service.

The best way to review the logs is via the List log events in a workspace REST API.

By providing the appropriate input parameters (apikey, workspace_id and API version), the results will be returned in JSON format.

Watson Assistant List Workspaces API

Calling the api in this way will return the entire log history. This can be unwieldy as the Voice Gateway/Watson Assistant has been in service for a while. Fortunately, there is filtering capability that can narrow the returned results to a more manageable level.

Watson Assistant List Workspaces API filtering on conversationID

Watson Voice Agent Insights Dashboard — Beta

The previous methods should be sufficient enough to trace sessions. There is a beta utility called Voice Insights that provides a nice browser-based interface that will mine all of the collected events and display them in a consolidated conversation session list. Analysis of each session will be streamlined so it can be explored to reveal all before mentioned elements and their relationship to each other. In addition, a REST API is exposed for integration into a customized solution.

Although utility is currently in beta, it is definitely worth checking out. It will continue to evolve and improve over the course of the year. The instructions and link to the docker image can be referenced here.

Voice Agent Insights Dashboard — Consolidated view of your WAVI call sessions

As our clients have employed these techniques into their support and operation methodology, hopefully it will also help you gain a greater depth of understanding of how WAVI works to do the same for your business!

About Us

Leo and Andy are Cognitive Engineers in the IBM Watson Expert Services group. We specialize in educating and enabling with clients with IBM’s wide catalog of Watson products and solutions.

Was this content helpful for you? Please provide feedback in the comments below. If you have any questions, let us know how we can help.

Want to Know More?

Here are a few other related topics authored by other fellow colleagues that will bolster your mindshare as you begin your WAVI implementation:

· Introducing Watson Assistant for Voice Interaction

· Watson Assistant syntax validation for Voice Gateway and SOE integration

--

--