LESSONS LEARNED

Make Your Voice-First Dialogue Robust

Using Variables to Train Smarter Conversational AI

Jan Sedivy

Published in

PromethistAI

5 min readMay 28, 2021

Even in this digital era, voice and spoken language remain the most natural channels of human communication. The demand for human-like (if not precisely humanoid) conversational AI is increasing. But the developers and designers of voice apps can’t design sophisticated interaction without the right tools. That’s where PromethistAI comes in.

Combining AI and Human Creativity

Our Promethist Platform combines the best of two worlds: AI and human creativity. We use state-of-the-art AI algorithms to process the ambiguous human language and to conduct the conversation. At the same time, human creativity is used to build reusable communication structures with personality and a unique ability to carry on engaging conversations.

Our goal is to help conversational AI developers design rich and robust dialogues efficiently. Human communication often follows repetitive patterns — and we reuse them across different applications. In this article, we will focus on some of these scenarios. Certain cases are quite straightforward: every application should be able to assist users asking for help, repeat the message if the user didn’t understand, or turn off on a “stop” request. But interaction can get much more complex. For example: What if the user doesn’t reply at all and stays silent? What if, by contrast, they speak in very long and complex sentences, still problematic for current NLP algorithms? And what if they utter something irrelevant?

Variables Enable Complex, Personalized Interaction

A robust application needs to handle all these and many other situations flawlessly. This requires some light programming. Let’s look, for example, at the “help” command. A novice user might appreciate much more detailed help than a returning, experienced user. So we might want to design diverse wordings for different user categories, and then, in runtime, identify the category of each user. How to do it? We have to maintain a user profile, where we will define a variable counting how many times the user has returned. Then it becomes super-easy to choose the correct version of the help message.

But what if the user has returned to our application after a long time? We may rightfully expect that they have forgotten how to go around, and we may choose a little more detailed kind of help. To analyze the last time the user had interacted with the application, we can use another variable in the user profile reflecting the last usage date. Selecting the right help is just a question of simple calculation.

We have to approach the “repeat” scenario very similarly. It may happen that the user will ask the app multiple times in a row to repeat the last message. In that case, we might again want to adapt the particular wording. To repeat the last response, the application needs to remember the previous response and potentially reformulate it. For this purpose, we can make use of the properties of the last conversational turn.

Another very tricky thing is the “silent user” scenario: The voice app is not receiving any user input, so we have to re-prompt the user. For the first time, we may just use a simple re-prompt formulation (something like “Hello?” or “Are you there?”), but if the silence continues, we might prefer to alter the wording or even switch to help. The human designer defines the logic and the re-prompts; they can be different for different interaction contexts. This means that you can even create e.g. a sort of help hierarchy: at some points, a universal help will be used, and in another situation, the turn-specific help will be activated. This is not a complicated decision, but it again requires setting up some variables. A proper utterance selection is a matter of a few “if then else” commands.

From the previous examples, it is clear that the voice application needs to work with variables. Some of these variables have only a temporary validity within one turn. Other variables, such as the last usage timestamp, need to be persistent; the application must remember them for the next interactions, retain them between successive sessions. And there are also other types of information that need to be retained only throughout the current session, such as the current session length (how many turns have taken place so far).

To summarize, we have seen here three types of variables with different scopes:

Turn — remembers the value within one turn (message — reply)
Session — remembers the value within one session
User — remembers the value across sessions

Making AI Contextual

We have seen how simple commands make every application more robust and how variables with simple processing provide much better usability. The application variables play a significant role in more complex applications. The “user” variables are an essential tool for personalization, as they continuously build up the user profile. In the user profile, we can remember all the user’s parameters, e.g. the name, last usage timestamp, personal preferences, and much more. In addition to these user data, we often need to work with other information such as the user’s location, client, current time, etc. We call all this context.

Working with context is essential for building user-friendly applications. We can always call the user by their name. We can avoid asking the same questions repetitively. We can make personalized recommendations. We can react flexibly based on the user’s location. Just imagine your intelligent car assistant telling you: “Last time when you drove to visit your aunt Jessica, you bought her a bunch of flowers. Do you want to stop at the florist’s again?” Indeed, we can work with user preferences, the destination from the navigation system, etc. The reasoning, i.e. the logic behind the dialogue, will require creating multiple variables and it won’t always be simple. But the user will be impressed — and that’s what counts!

Promethist Platform implements all these variables in the basic programming model. What’s more, designers can make use of Promethist reusable assets or even create their own to ensure their voice apps will work smoothly and flawlessly.

Would you like to follow our journey? Follow us on Facebook, Twitter, YouTube, Instagram, and LinkedIn.

Check out the Promethist Platform for creating smart conversational AI applications and virtual personas.

Enjoyed the article? Click the 👏 below to recommend it to other interested readers!