Multi-Channel chat bots

There’s a lot of talk about Conversational Commerce at the moment…even I have blogged about it. A key part of the vision around this is that a conversation can be started from a variety of places — apps, the web, messaging apps, etc. Although some people are building chat systems for a single channel (e.g. one buried inside Facebook Messenger), the really exciting part is when you expose that same conversation over a variety of different interfaces. I call this “multi-channel chat”.

Multi-channel seems important. It avoids having to duplicate AI training in each channel and allows a single investment to be exploited multiple times. Because of this, my team in Watson decided to explore the possibilities and build some prototypes. In the process, we’ve made some discoveries that I thought worth sharing.

What we immediately discovered is that much like other “write once, run anywhere” technology myths, this vision isn’t quite as simple as it first appears.

Because chat is (mostly) textual today, you might think that the differences between technology channels would be trivial. Unfortunately they aren’t. Mainly that’s because chat is already much more than just a text message. And it’s only going to get more so. People are already talking about ‘micro-apps’ within message systems. And the representation and behaviour of buttons, links, images, commands, and other capabilities already varies between channel technologies.

These differences mean there’s a need for a multi-channel chat layer in the end-to-end architecture that’s aware of and reacts to the capabilities of the channel being used. But its more that just a technology layer; the differences can also impact the training and dialog you create in your AI engine, as we shall see. As with other “write once, run anywhere” solutions, it’s necessary to have knowledge/experience of the underlying channel technologies if you want to build a good solution.

So, what are these differences?

Buttons & Links

The first thing I’d like to cover are buttons. Our first technology channel that we built was an iOS app (native, of course). The Watson Dialog service represents buttons with special HTML tags within the text. When we implemented support for this it quickly resulted in dialogs like the following

Would you like to do this thing? Please choose: <mct:input>Yes</mct:input> or <mct:input>No</mct:input>?

The Yes and No words became touchable links, which was cool — the user experience was much more efficient than asking the user to type out a response. Even more so with more complex questions.

Then we implemented our Telegram bot, which integrated that same Watson system into a messaging interface. And our next integration was with Facebook Messenger. Yes, just two weeks after the release of Facebook’s Messenger API we have a Watson bot running in the service. But not any bot; it’s the same Watson bot that you can access over Telegram and our iOS app. Cool — multi-channel chat!

Except…

It was obvious that neither Telegram nor Facebook were going to understand our special link tags. So, our bot integration layer transformed those tags into buttons on both Telegram and Messenger.

Telegram’s buttons are quite flexible, but Facebook’s idea of buttons is peculiarly limited — the number of characters in a Messenger button is very restrictive. Inevitably we had buttons longer than was allowed (why would I think any different?!). Here’s what our initial dialog in Messenger looked like — notice the truncated text on the buttons:

So, we were forced to refactor some of our dialog so that the buttons worked better in Messenger. The good news is that we now have buttons that work efficiently across our iOS app, Telegram and Facebook Messenger. This is what it looked like at this stage:

This was great, but now that the tags are stripped out of the sentence, we were left with a weird text body on the message. The buttons were moved, but the text between them was left. So, we had to refactor the body of the message to fix this. This is what it ended up looking like in Messenger:

So, the very first thing we did on our second channel resulted in us refactoring our dialog. What next?

Commands

Because we wanted our initial iOS app to have more function that just text-based messaging, we built a command protocol that allows Watson to instruct the app to do something. For example, the command

@SHOWMAP?LAT=xxx&LON=yyy

instructs the app to show a speech bubble with a map of that latitude/longitude.

When we implemented our Telegram/Facebook bots we didn’t build in the command function; it’s planned for version 2.0. But we can also imagine that we’ll have commands that a full-function iOS app can support, but that the more limited bots in messaging functions never will. As a result, we’re now informing Watson of the channel-type, so that it can adapt the dialog appropriately; today, if it’s Telegram/Messenger then Watson falls back to a simpler dialog that doesn’t use command features.

Speech/Text

We haven’t yet integrated Amazon Echo as a channel, but it’s on the to-do list. Mainly we haven’t because we’re UK-based and the Echo’s not yet available over here. But we will do it soon. I’m conscious that, because Echo is speech-based, this might imply some more differences. Buttons aren’t buttons anymore, they are spoken options.

I’m sure that when we get to work on this, we’ll discover ways to optimise the dialog so that it feels natural across link, button and spoken interface styles. But if you don’t think about speech when you construct the dialog, it’s quite possible you’ll end up with things that don’t make sense.

1:1 or Group

We’ve just implemented a Slack infrastructure in Watson; it’s great! So, a Slack-bot is obviously an interesting option.

In contrast to apps like Telegram or Facebook Messenger, Slack is a team-based chat system. So, a Slack-bot interacts in a more public forum. As a result, the nature of Slack-bots is a bit different to a Facebook bot; a personal banking bot makes sense in Facebook Messenger, but not so much in Slack (who wants their team to see their account balance?).

Although we could view Slack as “just another channel for the same chat”, in reality this isn’t the case. The function that makes sense in Messenger is different to the function that make sense in Slack. Maybe our integration layer needs to be told what interface we’re operating over and use that to decide which subset of functionality to offer.

Security

There’s a lot of interest amongst banks for bots and apps like Cloe and Penny are paving the way for #ConversationalBanking. But how does this work if the interface is a third-party messaging system? In such scenarios potentially sensitive information would be flowing through Facebook or Telegram’s infrastructure. There’s work to do here before people are comfortable with this idea. None of us want our personal information getting into a log database at Facebook by accident.

Bank of America have announced that they plan a Facebook Messenger bot. It’ll be interesting to see what function this supports and how they address the privacy issue.

Again, an integration layer in a multi-channel chat system probably needs to take into account security when deciding what subset of capability should be exposed over a particular channel.

Summary

In summary it’s clear that ‘train once, access anywhere’ is a little simplistic. We need to understand the capabilities and limitations of the interfaces we want to support before we build any AI dialog, not after. The function we support may vary across channels. And the type of function that makes sense in different messaging apps might vary considerably. Overall I see two key points from these experiences:

  1. Alistair Croll wrote about the need for a BotOS. He might be right; there’s certainly a role for infrastructure that understands the issues I’ve discussed, can help to manage them and adapts to the channel interface being used. At the minimum there’s a need for a quite flexible “Cognitive Channel Integration” layer in the architecture.
  2. In graphical interfaces we’re used to the idea of a style guide. It’s becoming increasingly clear that there’s a need for a style guide for conversational systems. Such a guide would take into account some of the things I’ve mentioned here and give guidance to Dialog writers that ensures their prose will work across the technology channels we want to support. Just diving in and building for one channel, without thinking about the wider possibilities, will almost certainly make integrating those channels more difficult in the future.