I’m Watson, let me transfer you to… Watson.

How to route between multiple assistants with Watson Assistant

Brian Pulito
IBM watsonx Assistant
12 min readSep 21, 2021

--

Photo by Lucas Lenzi on Unsplash

A common question we hear at Watson Assistant is “How do you route between multiple skills (or assistants) within a single user conversation?” This is typically required by large organizations that wish to align their development teams along Watson Assistant skill boundaries. Often times these development teams are associated with different lines of business within their organization, all of which play a role in supporting their end users. These patterns are also often used to route between assistants that each implement the same set of use cases but in different languages.

You’ll notice this article is about routing between multiple assistants. In our experience, there are two core ways to combine chatbot knowledge bases:

  1. Multi-skill assistant

This is a single user-facing assistant that contains multiple skills. This brings several challenges like how to handle digressions and disambiguation across multiple skills, as well as difficulty in classification because the skill training data resides in each of the individual skills that can potentially overlap and cause issues with routing to the correct skill. At this point, Watson Assistant does not support a multi-skill assistant.

2. Multi-assistant routing

This is more about routing and transferring between different expert skills from a root or primary assistant that is purposely built for the task. This sets the precedent that a customer is there to talk about a single topic, and if a complete topic change is required, the customer is clearly and transparently transferred to another assistant to handle those problems. This sets clear boundaries around what each assistant is supposed to do, and in many cases leads to a cleaner experience.

Multi-assistant routing is enabled through webhooks configured on an assistant (not a skill) and it enables all kinds of interesting use cases. The use case I will be discussing here is the ability to route between a single primary assistant and multiple secondary assistants within the flow of a single conversation. Just remember that a skill is embedded inside an assistant so even though you are routing to an assistant you are also routing to the skill contained within that assistant.

Getting Started

The sample code associated with this article can be downloaded from this link. The readme at that link will provide the details on how to get started setting up the demo. The git repo contains the following files:

  • Exported primary (root) skill
  • Exported secondary payment skill
  • Exported secondary support skill
  • Exported Node-RED flow for pre-webhook
  • Exported Node-RED flow for post-webhook

I used Node-RED to build the webhooks. Of course, you could use any runtime and programming language (python, java, etc.) to implement the same webhook behavior. I personally like Node-RED for demos because it allows you to visually lay out the processing flows of the message requests and responses and makes it easy to get something running quickly. Go here to read more about Node-RED. Also note that it’s easy to deploy a Node-RED app in IBM public cloud by going here.

The rest of this article assumes you have some prior knowledge of how Watson Assistant works. For instance, this article talks a lot about message requests and responses. In Watson Assistant, every conversation turn involves a message request and response (meaning the user asks Watson a question in a message request and Watson responds in a message response). These message requests are sent by “channels” — channels include a phone integration, web chat client, or even a custom channel developed by a Watson Assistant developer using the APIs. The channel initiates the message request and processes the message response. What’s great about the pre- and post-message webhooks is that they work on all channels!

Multi-Assistant Capabilities

The diagram below describes how the sample code associated with this article is structured.

Solution component diagram

In this diagram you can see that there is a single primary assistant (sometimes referred to as a root or concierge assistant) and two secondary assistants (payment and support). To keep the code as simple as possible, the skills provided are very light on intents. The goal is to show how to implement support for Multi-Assistant Routing, not how to build a complex skill using Watson Assistant. The capabilities supported by the sample include:

  • Ability to route from a primary to a secondary assistant
  • Ability to route back from a secondary to a primary assistant
  • Ability to route between two secondary assistants

This sample was initially built for the phone channel but works well with any of the other existing Watson Assistant channels including web chat, SMS, WhatsApp, etc. When calling into the primary assistant over the phone each assistant has a different voice associated with it, which makes it easy to hear when an assistant switch occurs. Its like you’re actually talking to different people!

Webhooks

Watson Assistant supports many different types of webhooks including these two which were used in the sample application:

  • Pre-message filtering webhooks (referred to as pre-webhook)
  • Post-message filtering webhooks (referred to as post-webhook)

A pre-webhook is triggered on a message_received event, and lets you intercept the user input before it is processed by Watson Assistant. A post-webhook is triggered on a message_processed event, and lets you intercept the output from Watson Assistant before it is sent back to the user.

The basic responsibilities of the two webhooks built for this sample are described below.

Pre-webhook: The pre-webhook in the multi-assistant sample is basically a message router. It sees every inbound message request sent to the primary assistant prior to the associated skill processing it. It makes the decision on whether to forward the inbound message request to one of the secondary assistants or on to the primary assistant skill. It makes this decision based on user_defined context maintained on a turn-by-turn basis (this context is described below). When a message request is forwarded to a secondary assistant, the pre-webhook also has the logic to decide what to do with the response. It can either send the response directly back to the channel, which circumvents any further processing by the primary assistant (including the post-webhook) or it can forward the original request on to the primary assistant. For instance, the secondary assistant can signal the pre-webhook that no reasonable intent was found and that it should forward the original message request on to the primary assistant.

Post-webhook: The post-webhook in the multi-assistant sample handles all responses sent directly from the primary assistant skill. It has the ability to create new sessions with secondary assistants or pass the responses directly through to the channel with no changes. The primary assistant skill signals the post-webhook using user_defined context when the thread of conversation control needs to be handed off to a secondary assistant, which results in a new session being created. If this signaling doesn’t occur, the thread of control stays with the primary assistant.

If you wish to learn more about Watson Assistant webhooks go here.

Implementation Details

To implement this the following user_defined context variables were used:

  1. multi_assist_orig_session_id: The original session ID associated with the primary assistant.
  2. multi_assist_url: Set in the primary assistant to initiate an assistant switch to the specified URL.
  3. multi_assist_session_id: The active secondary assistant skill’s session ID.
  4. multi_assist_no_match: Set in the secondary assistant skill’s anything_else when no intent is found.

These context variables are stored in the user_defined section of all the Watson Assistant message requests and responses and they are manipulated by both the webhooks and the various sample skills.

multi_assist_orig_session_id, is used to maintain the original session ID between the channel and the primary assistant. From the channel’s perspective, it is only ever talking to the primary assistant and uses a single session for the life of the conversation. It’s therefore important to always pass this original session ID in every response sent back to the channel. Note that this context variable is set in the post_webhook in the Post Response Router node in the Node-RED flow. It is set on the first conversation turn and stays set to the same value for the life of the conversation. This context variable is then used to pull the original session ID to set the context.global.session_id before any response goes back to the channel.

multi_assist_url, is used by the primary assistant skill to notify the post-webhook when its time to create a session with a new secondary assistant. For instance, in this sample the support and payment intent nodes defined in the primary assistant set this context variable to the associated assistant URL, which can be pulled from the assistant settings panel:

Example showing how the multi_assist_url is set in the Dialog panel.

The assistant URL is copied from this panel:

Example showing how to access the Assistant URL from settings.

multi_assist_session_id, is used to track the session ID associated with the secondary assistant. This context variable is initially set in the post-webhook after the secondary assistant session is created. This happens in the Post Session Create Processor node of the post-webhook.

multi_assist_no_match, is used by the secondary assistants to inform the pre-webhook that the assistant could not handle the requested intent and that the request should be forwarded on to the primary assistant. This is simply done by setting this context variable to true in the anything_else node of the secondary assistant:

Example show how to set the multi_assist_no_match variable.

When this context variable is received, the pre-webhook clears the previous multi_assist_url and multi_assist_session_id context variables and then forwards a request with the input text received on the initial request to the primary assistant (instead of the pre-webhook sending the response back to the channel). This code resides in the Secondary Response Processor node in the pre-webhook.

New Watson Assistant Pre-Webhook Feature

The Watson Assistant feature that makes multi-assistant routing possible is the ability for the pre-webhook to choose one of the following ways to respond when it receives an inbound message request (on the message_received event):

  1. Return a message request that goes to the skill associated with the primary assistant. This is the default behavior.
  2. Return a message response that goes directly back to the channel or client.

This makes it possible for the pre-webhook to route message requests on to an assistant other than the primary. This is accomplished by setting the following HTTP response header when the pre-webhook needs to return a Watson Assistant message response that goes directly back to the channel:

If this HTTP header is present in the response from the pre-webhook (regardless of whether its set to trues or false), Watson Assistant assumes that the payload returned in the response is an actual Watson Assistant message response instead of a request. It then bypasses any further processing of the message by the skill and immediately returns the response to the channel or client.

The code that sets this header resides in the Secondary Response Processor node in the pre-webhook. The sample code shows how this header is being set along with the original session ID before the response is sent back to the client.

When a Watson Assistant channel integration like phone or WhatsApp is used, the content of the HTTP response follows the message response as described by the Watson Assistant V2 API. All the Watson Assistant channels (phone, SMS, WebChat, etc.) use the Watson Assistant V2 Stateful API. That means pre-webhook developers should use the V2 Stateful API end-to-end (from within the pre-webhook) as well. In other words, any responses sent back from the pre-webhook needs to be in the V2 Stateful API response format.

In this sample, the post-webhook handles all session creates to the secondary assistants. That means the post-webhook uses the Watson Assistant V2 create session API to create new sessions. The pre-webhook in the sample also uses the Watson Assistant V2 Stateful API when making message request out to any secondary assistant.

Its important that any message request sent by a webhook to the secondary assistants using the Watson Assistant V2 Stateful protocol include the following input option in all message requests, which signals the runtime that all context (including user_defined context) be returned in the message response:

Assistant Training

At this point you are probably wondering, “How do I train the classifiers across all these different assistants?” The primary assistant can be as complex or as simple as needed. For instance, the primary assistant could be built like a very simple, top-level Interactive Voice Response (IVR) node that directs the conversation flow based on answers to very directed questions like “Select one for support, select two for payments.” This is essentially how the sample described in this article was created. You could also train the primary assistant to handle a larger number of intents that are mapped directly to their associated secondary assistants. Obviously there is a trade-off here between simplicity and usability. Note though if you decide to train the primary assistant on all the intents supported by your secondary assistants, you need to insure that there are no overlapping intents that could cause the conversation to be misrouted. This can be a maintenance nightmare, especially if your solution is continually evolving over time. However, a well trained primary assistant that is built on truly smart routing can lead to a very user friendly solution that flow naturally from one skill to the next.

Future Enhancements

If you set up and test this sample you’ll notice that whenever the conversation is redirected to a secondary assistant, the new skill always responds with the greeting. This is because the session create code in the post-webhook is followed by a message request which is hard coded with an empty text input string. One fairly simple enhancement would be to pass the original input text that came in on the conversation turn (in the initial message request), instead of just an empty text string. This has the potential to immediately get the user to the right intent without having to go through another greeting. The post-webhook can get to the input text in the response object and cache it until the initial message request is sent to the new session.

Another enhancement to consider is that the secondary assistants can also have their own associated webhooks. A best practice would be to use the webhooks demonstrated in this article to control routing from the primary assistant while using webhooks at the secondary assistants to handle business logic such as callouts to third-party APIs. This keeps the webhook logic isolated to where the functions are needed. It also allows each team managing their individual assistants to build and maintain everything on their own.

Lastly, implementors of this pattern may wish to insert other types of user_defined context that must be maintained as the conversation is redirected from one assistant to another. For instance, it may be important for an assistant that handles authentication to share user identity information with other assistants. This design certainly allows for that and can be customized to the specific needs of the use case.

Limitations

Watson Assistant has some great analytic tools but those tools are currently scoped to each individual skill. That means any solution that relies on the multi-assistant routing described in this article will potentially have a single conversation spread across multiple skills and will therefore need to look at analytics from multiple skills when analyzing a single conversation. In many cases this may not be a big issue. It’s just important to understand that this is a limitation that should be considered before moving forward with this pattern. Just like above, this also allows each team to improve their assistant individually, without risk of adding data from other skills.

One last limitation is that this solution relies on Watson Assistant V2 protocol. This is important because the only way to send a message to an assistant is using V2 (skills support V1). That means anyone using this solution would need to also use the V2 protocol. Also, if you are using a channel integration in Watson Assistant (phone, WebChat, SMS, etc.) you should know that the V2 Stateful protocol is used by those channels. Also, since this sample relies on webhooks and the Watson Assistant V2 protocol, the “Try It Now” panel will not work with this demo. Instead, if you wish to run a quick test with a digital channel you can use the Assistant Preview which utilizes the WebChat channel.

Conclusion

This article outlines all the details needed to build a solution on Watson Assistant that can process a single user conversation with multiple assistants and their associated skills. The assistant routing technique described here relies on both a pre and post webhook along with a brand new feature that gives Watson Assistant developers a powerful new tool to control how message requests and responses are routed. Please download the sample code and try it out for yourself!

--

--

Brian Pulito
IBM watsonx Assistant

Brian Pulito is a Senior Technical Staff Member at IBM and works on the Watson Assistant team.