If you have just 5 minutes to spare, 
go make a sandwich, not a chatbot!

Our journey in building a great chatbot, part 1

(Many thanks to my colleague and friend David Morand for suggesting this funny title!)

As my (prolific!) colleague Linda explained in a recent post, we have been working on a chatbot proof-of-concept at Nu Echo for a few months now. The chatbot allows a user to complete an address change/moving task. While the task seems quite simple from the outset, many challenges made us realize how limited most technologies are to build complex dialogues. We have learned a lot from this exercise and even built our own dialogue manager (DM) as a result!

In this post, I would like to through some of the challenges we faced.

But first, let’s describe the chatbot itself. It’s a self-service, task-oriented type of chatbot. It automates the process of moving a service at a fictitious utility company. A typical conversation with the bot goes like this:

  • After the initial greeting, the bot enters a dialogue to obtain all the pieces of information relevant to the moving process (date of move, new address, unit)
  • The bot asks for the date of the visit of the technician
  • The bot asks for a channel on which the user will be notified the day of the technician visit
  • The user is asked for a final confirmation

At any time, the user can check or modify any piece of information. The conversation can also be handed off to a human agent on the Genesys platform, either because of some difficulty completing the task, or on the user’s initiative. The agent then receives the whole conversation on the desktop, with all relevant attached data.

Non-linear (aka mixed-initiative) dialogues

Chatbots, as opposed to GUI-based applications, promise more conversational and natural dialogues. But do they really deliver? A true conversational system should support mixed-initiative dialogues, which is not the case with most chatbots we can see on the market. These types of dialogues allow the user to take the lead and drive the conversation instead of just answering questions. The system then reacts properly by adapting to the user’s requests. (Otherwise, we’re stuck with a text-based version of a dreadful IVR system.)

Providing a natural, mixed-initiative type of dialogue was thus one of our main goals. Here is an example. Our chatbot asks the user to provide a channel for later notifications. It offers a choice between phone (voice call), SMS, and email. At this point, it is reasonable to think the user may want to know what email address the organization has in its records. In this case, the system needs to give the requested information and properly continue the conversation.

That’s great. But wait! We can do event better and go one step further in this case. The system could use the fact that the user wants to know an information from his account to drive the dialogue more efficiently:

You see? Simple, effective dialogue!

This pattern of dialogue can be useful in other contexts as well. Suppose I build a bot for a financial institution to notify a customer that a pre-authorized payment would fail due to insufficient funds in his account. The conversation could go like this:

Bot: Hi Dominique. I just noticed you don’t have enough money in your checking account for your next pre-authorized payment of 45.00$ due tomorrow. Would you like to transfer some money to avoid fees for insufficient funds?
Me: Yes
Bot: From which account? Your choices are “Savings Account” and “Retirement Account”.
Me: How much do I have in my savings account?
Bot: You have 314.15$ in your savings account. Would you like me to transfer money from your savings account?
Me: Yes, please.
Bot: Alright. I’m transferring 45.00$ from your savings account to your checking account.

This kind of dialogue is very difficult to implement effectively using most dialogue engines. You need to handle some intents in a global way and get back to where you were in the dialogue. For instance, all requests to check some information are defined globally in our dialogue manager. Sometimes, they require disambiguation. If the user enters “I would like to know the date”, what date is it? The moving date or the technician visit date? If a single one is set, then the chatbot just gives it to the user. But if both are in the context, the chatbot needs to ask which one the user is interested in.

That being said, even though we can implement these types of dialogues with our own DM, they are still a bit cumbersome to define. We haven’t yet found the optimal way to express them. It’s still a work in progress.

Slot-filling

Another goals we had was to let the user express himself in a natural way, give as many pieces of information as possible in the very first sentence, and let the system ask for any missing information required to complete the task. This is another aspect of a good mixed-initiative dialogue.

Consider the following 3 conversations:

They all provide the same information to the system. They only differ in how and when the information is provided. In the leftmost one, the user does not provide any information in the first utterance, but provides 2 pieces of information (address and unit) in the second, and is asked for the date in the last interaction. In the rightmost conversation, the user provides both the date and the address in the first interaction, with the unit missing.

One problem we face when we try to code this in a typical rule engine (or dialogue flow engine, or whatever tool you use to build a chatbot), is the need to express all combinations. In the case of our opening question, we must consider 6 possibilities (we have 3 optional entities that can appear in the user request, which makes 6 different combinations):

  • “I want to move my service” (no entities)
  • “I want to move my service on December 3rd” (date entity only)
  • “I want to move my service on December 3rd to 1435 Saint-Alexandre street, Montreal” (date and address entities)
  • “I want to move my service on December 3rd to 1435 Saint-Alexandre street, Montreal, suite 200” (date, address, and unit entities)
  • “I want to move my service to 1435 Saint-Alexandre street, Montreal” (address entity only)
  • “I want to move my service 1435 Saint-Alexandre street, Montreal, suite 200” (address and unit entities)

(It doesn’t make sense for the unit to appear if the address is not present.)

You don’t want to have to specify 6 different rules to process each combination. It’s so obvious I’m a little ashamed to mention that, but some conversation platforms force us to explicitly code everything.

Platforms like Amazon Alexa, Google DialogFlow and Recast.ai all implement a form of slot-filling algorithm. When a piece of information is missing, a question can be sent to the user to provide it. (Interestingly, this is something reminiscent of the form-filling algorithm found in VoiceXML, a markup language for IVR applications, called the Form Interpretation Algorithm.)

In the case of our moving chatbot, we also wanted to include a visual confirmation of the address (with the help of Google Maps) before asking for the missing unit or date, making it more difficult to use a pure form-filling type of algorithm.

Contextual interpretation of entities

Another interesting difficulty came from the ambiguous nature of natural language and the need to interpret some entities contextually. Ordinal numbers, for example, need some context to be interpreted properly. If the chatbot offers a set of choices and the user answers with “the second”, this clearly means the 2nd element of the list.

However, consider the following conversation with our moving chatbot:

In the answer “no, the 2nd”, the entity “2nd” most probably refers to a date, not an element in a list. Moreover, “2nd” refers to a date relative to the one in the question, not the second day of the upcoming month...

By the way, a really annoying feature (!) of many NLU systems is their tendency to interpret dates as absolute dates and not consider the context. You probably noticed in the example above that the interpretation of “2nd” resulted in “December 2nd, 2017” and not the expected “January 2nd, 2018”. It’s the NLU engine that returned this interpretation there is no form of post-processing applied afterwards. (Looks like NLU engine providers don’t want chatbot developers to code, so they make arbitrary decisions for them.)

But honestly, when NLU engines try to be too clever, they’re in our way when we want to make great chatbots!

In the case of dates, the NLU engine should always return a structure for dates, with slots for the various parts: “September 2nd, 2018” would result in {“day”: 2, “month”: 9, “year”: 2018}, while “dec 5th” would only return {“day”:5, “month”:12}. It would then be the chatbot developer’s responsibility to figure out the missing part from the context. Or ask for it! (And ordinal numbers would be returned as such, not as dates, of course!)

Likewise for relative dates (eg. “tomorrow” or “the next day” or “two weeks from now”). The NLU engine should return a structure with appropriate information to better contextualize the answer. And by doing so, that would simplify the task of benchmarking the NLU part of the application. For now, running a benchmark is such a pain when dates are involved! (More on that in a future post.)

That’s it for now! Stay tuned for my next post, where I will cover other challenges, namely error handling, and the need to express recurring dialogue patterns.

Like what you read? Give Dominique Boucher a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.