Getting Started with Watson Assistant: Testing

It’s vital to test your assistant before setting it live. In this post, we teach you how to spot issues early and best practices to solve them (4/5)

Published in

IBM watsonx Assistant

7 min readApr 8, 2021

Now that you’ve planned, built, and completed your assistant, you’ll want to test and make some incremental improvements before setting it live to your customers. So how do you effectively do this? And when do you know that the assistant is ready to be launched?

We’ll get to all of that, but before we do, just a reminder of where you are in your first-assistant-journey:

First-Assistant Getting Started Steps

Plan it out — a few hours (you already did this!)
Build it — half of a day (you already did this!)
Complete it — a few hours (you already did this!)
Test and improve it —half of a day (you are here)
Launch it — a few hours

First Round of Testing

Ok, now let’s get moving. There are a few tips and tricks that we usually recommend to our customers when they start testing.

For starters, you should use the “preview link” integration to send the assistant to about 10 to 15 colleagues (ideally, they would be folks who weren’t involved in building). Ask them to spend 10 minutes interacting with and asking questions to your assistant in a few separate conversations. Have them log and categorize any issues they experience so that you can review them later (use the categories below as options).

Preview link integration with sharable URL

Your testers should be understand the domain that you’ve built the assistant to operate within, but not all of the specific topics it can and cannot handle (your users won’t have that awareness either). We also recommend having your test users do what they would normally do when they get frustrated—ask for a human agent! This specific trigger show up in the analytics dashboard and will help you identify problems faster.

Once you have 20–30 conversations, it’s time to analyze and improve your assistant. We recommend you think about problem areas in the following buckets:

Understanding

Is my assistant properly understanding on-topic requests?
Are there important topics that aren’t yet handled by my assistant?

Resolution

Once the assistant has understood the request, are people making it to the end of the dialog successfully? If not, why?
Is the assistant escalating conversations to agents more frequently for certain topics? If so, why?

Pro tip: The root cause for “drop outs” in the middle of a conversation could be any of the following, so you’ll want to watch out for them:

misunderstandings of specific clarifications (e.g., issues with entity training)
the content you’ve built isn’t useful to the user
was the wrong topic in the first place (sometimes users realize this multiple steps into a flow)
general fatigue / frustration with longer flows

Start with Understanding

Before we work to make sure the specific responses and flows you’ve built are working, we should start by making sure the assistant is understanding what your test users are asking. For that, head to the overview page of your dialog skill and scroll down to coverage trend.

Here you can see the breadth of topics your assistant can handle (we recommend shooting for 40–70% coverage rate for your first launch). From this chart view, you can navigate into the user conversations, specifically looking at messages that your assistant was unable to answer.

Is my assistant properly understanding “on-topic” requests?

Once you have a list of uncovered questions, first ask yourself if any of the messages should’ve been handled by existing intents within the assistant. If so, simply select the message and then choose the appropriate intent to add the message to.

In addition to the dashboard, check the issue log to make sure that users didn’t have other issues with topic understanding.

Over time, this process of manually checking logs for misunderstandings can become fairly tedious—which is why we’ve recently launched a feature called Autolearning that can do the manual work for you! More on this later in the post, let’s keep moving through the test conversations for now.

Are there important topics that aren’t yet handled by my assistant?

Next, you’ll want to check that the assistant is covering the right topics. Since your testers aren’t your actual users, you might want to take this with a grain of salt.

To do a quick check, use the same tools as before, but with a new lens. Check the “uncovered” messages view to see if any topics need to be built, and look through your issue logs to see if any of your testers listed similar issues. For any related messages that are uncovered, create a new intent/dialog flow and add these examples to it.

This process of finding new topics to build can also become a fairly tedious and painful task after you’ve launched to hundreds or thousands of users. That’s why we have a feature called Intent Recommendations, where Watson automatically groups related messages, resulting in a list of suggested intents prioritized by the frequency of how often they occur. More on this feature later.

Now focus on resolution

Now that you’ve worked through your assistant’s understanding, it’s time to focus on the desired outcome. Here, the question you should be asking yourself is, “Were my users able to complete their requests successfully?”

Start by reviewing the issues your colleagues logged to identify those keeping them from completing their request. Use the “search user statements” to cross-reference your colleagues’ notes with the conversation view inside Watson Assistant to pinpoint the problems. Now go fix the underlying issue within your dialog.

Reviewing each and every conversation is a great way to pinpoint issues early in your assistant’s life. But once you go live and start scaling to thousands of users, it’s not realistic to manually review tens of thousands of conversations!

This is where you’ll want to shift your attention to a success metric to narrow your focus to only those conversations that were unsuccessful. More specifically, we believe that containment, or the percentage of conversations handled without human agent involvement, is a great success metric.

To get a feel for how to use the containment rate to find problems, start by reviewing the containment charts to identify if there was a specific time period where containment issues were particularly noticeable. From here, you can filter the user conversations to those not resolved during the problematic time period. This allows you to hone in on the problem conversations and keep the review volume to a manageable level.

Prepare to Launch

Now that you have real usage data and have identified some improvements to make, you’re almost ready to set your assistant live to your customers. Just before you do that, there are two features we recommend so you’ll be able to quickly improve your assistant once it’s live.

Autolearning: improve your existing intents automatically

The first feature to enable is Autolearning, which observes and learns from your customers’ behavior and will work to provide the most accurate responses to their questions. This will save you a ton of time fixing issues with intent understanding.

To enable Autolearning, go into the Autolearning tab, and select your live assistant as the data source for observation. Then simply toggle the switch on and let your assistant improve itself. If you don’t yet have an assistant set up for live traffic, we will remind you to set this up in the next post.

Intent Recommendations: find new intents faster

The second feature is Intent Recommendations, which groups messages from your users and gives you a list of suggested intents sorted by how often the messages come up. Using this feature will save you time uncovering which topics to tackle!

To enable Intent Recommendations, go to the main Intents page and select “Recommendation sources” at the top of the page.

We suggest selecting your live assistant under “Connected assistant log” for your data source. Once there is enough traffic flowing through your live assistant, you can leverage Intent Recommendations based on real inquiries.

Again, if you don’t have an assistant set up for live traffic, we will remind you to set this up in the next post.

Wrapping Up

At this point, you should feel confident setting your first assistant live! And once you go live, you can rest assured that you’re also set up to rapidly identify problems and make improvements using the Overview Dashboard, User Conversations view, Autolearning, and Intent Recommendations.

It’s important to remember that you won’t be able to think of every single thing your customers might ask about. And you can’t possibly imagine every single problem that might arise during a conversation. You’re not omniscient, but that’s okay! Why? Because your assistant is already set up with the appropriate fallbacks if things go wrong and because you have all the tools you need to improve your assistant rapidly after you launch!