Fast Ramp-Up on IBM Voice Agent with Watson

Scott Graham
IBM Data Science in Practice
5 min readMar 25, 2020

We’ve all had those frustrating situations in trying to get an answer on the phone. Whether it’s long hold times, complicated number trees, or the inability to reach the people you need this is not what people want to be doing.

A successful strategy we at IBM Watson have seen deployed has been to expose intelligent assistants via the phone network. These assistants can triage and answer initial questions, greatly increasing capacity of existing staff.

Let's Fix That

Especially in these times of crisis, it’s important to have clear, capable AI assistants in order to provide high-quality information as quickly as possible.

Many of our customers are using IBM Watson Assistant and have specific skills. If you haven't had a chance to learn about how to build Digital Assistants — Check out the video below:

The remainder of this post is focused on providing a few quick tips and links to enable existing digital assistants to be connected via phone.

Beyond Setup

After you have already created your Watson Assistant and setup IBM Voice Agent to converse and respond with the desired answers and information; you can tune your assistant responses to enable a few phone-specific features.

  1. Enable call transfer/transfer to an agent… How do you get to a real human on the phone?
  2. Enable DTMF(to use the phone keypad to type numbers)
  3. Use SMS/text messaging for long responses (via Discovery Search Skill for example). These don’t work well over the phone, but you can offer to text a link to the user!
  4. Hang up the call
  5. Add some custom speech training, particularly for crisis-related vernacular — you can do that a couple of ways… Using custom grammars or creating custom language models.

Pro-Tip

All programming (using the vgwAction commands in the following section) of the Voice Agent happens by editing JSON responses in the Watson Assistant Skill/Dialog editor. To get to this feature you need to select JSON Editor in the drop-down menu as shown in the image below:

Opening the JSON Editor in Watson Assistant

Call Transfer

For a digital assistant to maintain customer satisfaction, you need to ensure a customer can get to a real human if the assistant cannot answer their question. With a Voice-enabled assistant, that means issuing a “Call Transfer” using SIP REFERs. You need to ensure your SIP Trunk is set up correctly to receive the transfer action. Then you can program the Voice Agent using the vgwActTransfer — that looks like:

{
"output": {
"text": {
"values": [ "Please hold on while I connect you with a live agent." ],
"selection_policy": "sequential"
},
"vgwAction": {
"command": "vgwActTransfer",
"parameters": {
"transferTarget": "tel:+18889990000"
}
}
}
}

DTMF

In use cases where you need to collect numbers as input, it is often useful to use dual-tone multi-frequency signaling (DTMF)from the phone keypad. In order to collect DTMF, you can use the vgwActCollectDtmf action as follows

{
"output": {
"vgwAction": {
"command": "vgwActCollectDtmf",
"parameters": {
"dtmfTermKey": "#"
}
},
"text": {
"values": [
"Enter your phone number."
]
}
}
}

Enhancing your Assistant with SMS

Interacting with a digital assistant by voice is different than by typing. People ask questions differently and long responses are easier to read than listen to on the phone. When you have a long response, possibly due to using the Search Skill, Voice Agent provides the ability to text via SMS to the calling phone number. If the response fits within SMS parameters you can send it completely or you can send a link to the response that the user can access.

In order to do that, you need to set up SMS with your Voice Agent. You can then use vgwActSendSMS to send a response over text during your voice conversation. Using that action looks like:

{
"output": {
"text": {
"values": [
"Okay. I will send you a text message now."
],
"selection_policy": "sequential"
},
"vgwAction":
{
"command": "vgwActSendSMS",
"parameters": {
"message": "This is a test message from Watson Assistant"
}
}
}
}

Hang Up The Call

Of course, the phone is also different in that both ends usually hang up when the call is over. You can hang up a call using the vgwActHangup action:

{
"output": {
"vgwAction": {
"command": "vgwActHangup"
}
}
}

Speech Training

If you are using English, you should use the ‘Shortform-Narrowband’ model. It is tuned for shorter utterances typical during a phone/assistant interaction. Other languages should probably use the Narrowband models when available because narrowband audio is the default quality for telephone calls. For most applications, this is usually enough. However if the use case requires particular phrases or vernacular, it is useful to enable customization through creating custom language models and Grammars.

A great option for generating a Corpora for training a custom language model is to export the Intent examples from Watson Assistant and use them as your initial corpora. The following command is an example of how to do that:

curl -u "apikey:<apikey>" "https://gateway.watsonplatform.net/assistant/api/v1/workspaces/<workspaceid>/intents?version=2020-02-05&export=true" | tr ‘,’ ‘\n’ | tr -d ‘{}[]\"’ | awk -F: ‘/text/ {print $2}’ > corpora.txt

Once you have the corpora.txt file you can create a language model from it following the directions in STT.

Third-Party Integrations and Service Orchestration Engines (SOEs)

Watson Assistant has moved away from SOEs to integrate with external applications to populate responses in favor of Webhooks from within Skill responses. However, they are often still used and may be necessary if you choose to redact data inbound to Watson Assistant. You can read more about them and their use with Voice Agent here and explore some sample code in our Github repo.

Additional Links

Voice Gateway/Voice Agent are often used interchangeably. As a point of information, Voice Gateway is the On-Prem solution and Voice Agent is the Cloud solution. Programming both via Watson Assistant skills is the same.

Voice Agent/Voice Gateway Github Sample RepoUnderstanding the Watson Assistant for Voice Interaction SolutionControlling Speech to Text background noiseVoice Agent DocumentationWatson Assistant DocumentationSpeech to Text DocumentationText to Speech Documentation

Scott works on the Watson Assistant for Voice Interaction (WAVI) team which includes the IBM Voice Agent. He considers himself more of a customer advocate, full-stack developer, and Watson architect. His experience working with the Watson APIs building applications spans 5+ years.

--

--

Scott Graham
IBM Data Science in Practice

Watson Wizard @IBM - I’m an engineer & architect working on the Voice Integration for Watson Assistant. I have a love/hate relationship with computers.