PUBLIC ANNOUNCEMENTS

Voice First Hybrid Model — Built on Promethist Platform

PromethistAI announces new hybrid conversational AI architecture supporting both advanced social conversation and goal-oriented tasks.

Published in

PromethistAI

6 min readMar 5, 2021

PromethistAI announces a new hybrid conversational AI architecture supporting both advanced social conversation and goal-oriented tasks. We first categorize chatbots and then explain how we define the conversational bot. Finally, we introduce the hybrid model - the combination of the rule-based and generative models.

Photo by ThisisEngineering RAEng on Unsplash

Bots’ History

The oldest and simplest are text bots. We can see these bots on web pages of banks, insurance companies, retailers, and others helping customers.The interaction starts with an opening message, and the user continues by selecting from a set of the most frequently asked questions. Slightly smarter bots let the user type a query. The AI algorithms extract intent and slots, and the dialog manager chooses the next step from a still simple dialog graph. Text bots can handle only restricted domains. Some of them are voice-enabled, multimodal, or they operate over POTS as an IVR.

Conversational bots allow richer conversation with more topics. They can chat about the latest news, movies, celebrities, and other social issues. The NLP needs to be smarter — in addition to identifying intent and slots, it must recognize topics and decide how to switch between them. For example, users might be engaged in a conversation about the latest movie, but in the middle of the dialog, they may ask, “what is the weather in London” and quickly return to films again. The development of a social bot conversing about several topics is quite a complicated task. For each subject, the bot needs to access the internet and knowledge databases to provide the latest information.

Task Oriented Versus Conversational

People want to use bots to quickly and efficiently accomplish various tasks, control devices, etc. At the same time, they want to engage in entertaining dialogs. For example, a car bot provides voice control for the air conditioner, adjusts cruise control, finds the destination, arranges a maintenance appointment, etc. In addition, the passengers might want to learn about a nearby castle or chat about the destination city’s history. This means the car bot is capable not only of executing commands, but also carrying a social conversation. Similarly, we can envision a video conferencing package that starts the session by greeting the user and promptly informing that a meeting with Seattle starts in 15 minutes. In between, the user may ask about the population of Seattle, the weather, the latest cultural events, etc.

People want to use bots to quickly and efficiently accomplish various tasks, control devices, etc. At the same time, they want to engage in entertaining dialogs.

Knowledge of general topics and social skills make the application much closer to the way people communicate. Step by step users, will learn to talk to applications in the most natural form as if they were talking with humans. We also expect that the social part will be very similar for most applications, whereas the command control or task-oriented role will be specific. Our goal is to offer developers a complete social conversation with extensive capabilities and tools to extend it with specific goal-oriented and command-control tasks on the Promethist Platform.

Rule-Based and Generative Approach

Currently, most task-oriented and command-control bots use the rule-based approach for controlling dialog flow. The dialog manager uses intents and slots extracted from the user’s’ responses to select the correct reply. We can visualize the dialog as a graph. For example, the Promethist Platform allows developers with no programming skills to use the drag and drop graphical interface to build dialog graphs for rule-based dialogs.

The Promethist Platform allows developers with no programming skills to use the drag and drop graphical interface to build dialogs.

The social bot part is a much more challenging task. The user may want to chat about almost anything. Imagine how many topics Wikipedia contains. To handle the social chat with only a rule-based dialog manager is impossible. You may focus on some topics and provide a good experience, but you never can cover everything.

Recently, researchers have made tremendous progress in the development of end-to-end generative models. These models use the latest neural networks to learn the language model. They may generate a reasonable dialog response for any topic. One of the recently most advanced generative algorithms, GPT-3 from OpenAI, creates stories, emails, and even poetry. GPT-3 developers trained the model on 45 TB of text. It is a fantastic machine. It generates the dialog bot’s replies that match English grammar, and the sentences make sense. It works like this: we enter the previous sentences from a dialog, the context, and the system generates a reply. These models are the future.

Generative models are the future of conversational AI. However, we can’t use them alone yet.

However, the generative models alone do not provide the complete solution yet. For example, imagine you use the following context: “We expect strong winds tomorrow. They will break” How do we want the dialog to continue? Is it “record speeds” or “branches?” We are still working on a more advanced level of controlling the generative model. We try to find a solution to inject the pragmatic level of language to the dialog. This is still an open research problem. Nevertheless, we found a way to combine generative models creative power and the fine-graded control of rule-based models.

Hybrid Architecture

The below figure shows a block diagram of a runtime hybrid architecture combining the rule-based processing with a generative model. The ASR converts an utterance into text, which is tokenized and pre-processed in an NLP pipeline. The text then continues to a topic switch (Topic SW), routing the processing to the generative neural network (GNN) or the rule-based dialog manager. The NLG prepares the reply, and the TTS converts the text to voice.

Block diagram of the hybrid Promethist architecture.

The critical component of the system is the Topic SW. It has to be smart enough to make the dialog consistent. It needs to switch between generative and rule-based processing properly. For example, it has to keep the rule-based dialog running until it finishes. But if the user asks a question during the rule-based processing, the generative part will answer. After answering, the switch needs to make sure the conversation continues in the previous dialog. The switch also acts as an association manager. The generative system outputs a sentence, and the switch will extract entities to expand on them. For example, the system generates a sentence: “electric cars are getting popular.” It can use the word “car” to switch to a rule-based dialog about cars. The rule-based dialog is better equipped to ask: “speaking about cars, what is your favorite?” The answer may be just one word, and we need to validate the car name and store it in the user’s profile.

The future conversational AI will be closer and closer to resembling human-like dialogs to let us enjoy an entertaining and engaging chat.

The future Conversational AI will be closer and closer to resembling human-like dialogs to let us enjoy an entertaining and engaging chat. The applications will remember our preferences and will adapt to our needs. Voice will become one of the most powerful communication channels. We at PromethistAI are excited to be part of this never-ending journey toward better communication.

Would you like to follow our journey? Follow us on Facebook, Twitter, YouTube, Instagram, and LinkedIn.

Check out the Promethist Platform for creating smart conversational AI applications and virtual personas.

Enjoyed the article? Click the 👏 below to recommend it to other interested readers!