Working with IBM Watson Conversation

A developer’s approach to the workspaces

Our team recently worked on the development of a chatbot, using Watson Conversation for natural language understanding (NLU) and a portion of dialogue management (not all of it, because our application needs custom code on our end to generate some dialogue elements). The chatbot is a proof of concept we are currently developing in our Omnichannel Innovations Lab at Nu Echo.

The tools provided by Watson gave us a way to have a working prototype really fast, but it wasn’t very long before we hit a wall in terms of productivity once confronted to a number of core requirements in our project. Let me quickly explain the issues we had before presenting the approach we took to solve them.

Working with Watson Conversation

Here are some of the problems we encountered with Watson Conversation.

1) Workspace proliferation quickly became a problem

Watson Conversation uses workspaces to organize configurations for a chatbot. The workspace is where the user defines categories and examples for intents and entities. It’s also where the conversation model is built, using a tree-like structure of nodes.

It’s very straightforward, and Watson provides a simple web interface to modify the workspace.

It works fine for single language applications, but when we wanted to support a second language for our chatbot, we realized there was a problem with this method, as it makes the workspaces tightly bound to a given language. There is no way to easily change the language of a workspace, because this means modifying both the dialogue texts and the intents/entities examples. While the dialogue text could easily be replaced with ids that could then be matched application-side with properly localized texts, the same cannot be done with the examples. The easiest way to alleviate that problem was to upload two workspaces, one for each language.

Likewise, we wanted to support different versions of our chatbot for a number of flavors, or brands. Each brand would use the same basic conversation structure (with the possibility of skipping certain unneeded dialogue nodes) and the exact same intents/entities, but with a different set of dialogue prompts. Again, the easiest way to do that was also to duplicate workspaces. Supporting multiple languages for the different flavors of the conversation meant adding even more workspaces.

While this approach allows multiple concurrent versions of a chatbot conversation to coexist, it also makes maintenance a nightmare.

You can very quickly end up with a lot of similar, yet distinct conversation workspaces. Adding a new conversation path or fixing a critical problem requires manually modifying each variant of the main workspace, which is both error-prone and time consuming.

2) Workspaces are not development nor team friendly

Even if you’re perfectly happy with only one workspace, you may still encounter problems with the way Watson wants you to edit the workspace: the web interface.

Figure I — Watson Conversation’s web interface

It may be pleasing to the eye, but it’s not a really efficient way to build or modify a conversation (other than a very simple one). The visual representation of the conversation flow is not easy to navigate (though, to be fair, newer versions of the interface did improve the user experience). Also, the tree-like representation of the conversation also makes it very difficult to implement generic responses and error handling without cluttering the workspace with callbacks and repeated conditional logic.

Watson does offer an alternative: a REST API to create or modify conversations programmatically, which is a vast improvement over the web interface (even though there is no way to validate the workspace before uploading it to Watson), but it doesn’t solve the workspace proliferation problem: even when using the REST API, supporting multiple variations of a workspace means that having different people working on different versions at the same time will result in diverging conversations, with no easy way to merge or unify them.

3) No direct control over the source code

By default, all the workspace code is kept on a Watson server. This means no backup, no version control, nothing. This is far from ideal. Plus, since the custom code we wrote on our end to extend dialogue management was already properly committed to our git repository, it felt weird to only version control a partial implementation of our solution. We wanted to ensure that the workspace was always in-sync with our custom code, and keeping the code on the Watson server gave us no way to do that. We also wanted to be able to tag versions of the workspaces, for easy rollback in case of regression.

We needed to rethink the way we were handling workspaces. One of the things we did was to take a look at how they were implemented.

Taking a workspace apart…

At its core, a Watson workspace is simply a JSON file containing all the information required for the NLU and Conversation parts to work. And good news: it’s possible to export this JSON file from an existing workspace (and import a JSON file to create a new workspace).

Our immediate first step after exporting our main workspace was to add it to our git repository, in order to have some kind of backup in case of catastrophic failure and/or regression. But it didn’t solve our main problem: supporting different versions of the same conversation was still a chore, even if we could now, in theory, completely bypass the web interface by manually editing the JSON file.

From there, it became clear that we wanted to programmatically build the workspace and upload it before each release of our chatbot, in order to free ourselves from the need to maintain multiple versions of what was essentially variations of the same conversation. But that was quite a big step to make: we would need to find a way to modelize and represent the dialogue of our chatbot without the limitations of the tree-like structure found in the workspaces, and then convert that representation into something that could be parsed by Watson. That meant a lot of trial and errors, since we’d also need to develop some way to validate the converted structure before sending it to Watson. We decided it might be best to start with a less ideal, but easier to implement solution: gut all the variable elements from a workspace, and reconstruct the workspace when needed.

That process might even give us some insight about how workspaces could be generated.

We spent some time studying the structure of a workspace exported from Watson. It quickly became apparent that the main obstacle to an automatic repopulation of the workspace was that its content was tightly coupled to its structure. The only way to change content was either to alter the structure to accommodate modifications, or to duplicate the whole thing — which is exactly what we were doing by spawning a new workspace for each variation of the conversation. If we could separate the content from the structure, we could isolate elements in a way that would greatly reduce the consequences of a change, thus facilitating development, easing maintenance and ensuring overall greater stability (with simple changes having, hopefully, a lot fewer chances to break the system).

A workspace JSON file is mainly composed of three lists: one for the intents, one for the entities and one for the dialogue nodes.

Intent and entity lists are structured similarly: each entry consists of a value (the name of the intent or entity) and a list of examples (plus optional metadata, like creation and modification timestamp).

Nodes are a little more complex, and contain references to a parent node, a go_to node, a set of conditions to check if the node needs to be activated, the expected conversation context and, finally, the text that the node will output upon activation.

So where exactly is the delimitation between content and structure here? On the content side, we identified the example lists for the intents and entities, and the text output for the nodes. Everything else was considered as structure.

From there, all we had to do was to extract the content from a workspace JSON file. Each example list and output text (for the intents/entities and nodes, respectively) was neatly stored in its appropriate text document inside our git repository, using the following structure:

Figure II — Workspace deconstruction (1)

This way, adding a new intent, for example, has no repercussion on the entities or the nodes.

We can also easily make different sets of examples and output, without touching the structure. As shown in Figure III, we support both English and French versions of the conversation:

Figure III — Workspace deconstruction (2)

We also added an overrides subsection, which lets us easily define alternative flavors for a given dialogue, without the obligation to change everything.

Figure IV — Workspace deconstruction (3)

As we can see in Figure IV, the override “flavor_1”, if activated, keeps all the original node outputs, but redefines “node_B”.

… and rebuilding it

Of course, at this point, the task of rebuilding a workspace from all the dispersed pieces is quite daunting, and involves copying all the content of the extracted files back to the JSON file, at its appropriate place. This is why we automated the process with a script that takes as input parameters the desired language (and, optionally, the flavor override) and outputs a fully rebuilt workspace, which can then be imported back into Watson, either via the web interface or their REST API.

At the end of the day

By breaking the Watson workspace into smaller, independent and more manageable pieces, we ended up avoiding most of the pitfalls that plagued our initial usage of Watson Conversation, thus improving our productivity, even if it meant adding another step (rebuilding and importing the updated workspace) to our release process.

The next logical step would then have been to completely abstract the dialogue from the workspace and automate the process of converting it to a Watson friendly form, making the final form of the workspace some kind of bytecode (even though it’s actually in JSON). But as we worked on modelizing our dialogue, we realized that we wanted to experiment with other dialogue management technologies (rule-based, neural network-based) before returning to Conversation.

This is where we’re at now. We still use Watson for the NLU, but are working on an experimental in-house solution for the dialogue management. This might be the topic of another blog post!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.