5 Models for Conversational AI

Published in

Analytics Vidhya

7 min readSep 2, 2020

How can chatbots become truly intelligent by combining five different models of conversation?

Conversational AI is all about making machines communicate with us in natural language. They are called using various names — chatbots, voice bots, virtual assistants, etc. In reality, they may be slightly different to each other. However one key feature that ties them all together is their ability to understand natural language commands and requests from us-human users.

In the back-end, these agents will have to deal with carrying out the request and engage in a conversation. Based on how an agent processes the input natural language (NL) request and its mapping to a response, we can create a class of Conversational AI models.

Interactive FAQ
Form filling
Question Answering
NL interface for databases
Dialogue Planning

A boy speaking into a microphone — Photo by Jason Rosewell on Unsplash

Interactive FAQ

Frequently Asked Questions (FAQ) are usually a common part of business websites where all the frequently asked questions for customers are listed and answered. Instead of having customers go through the list and find answers to their questions, Interactive FAQ model for chatbots allows users to ask questions in their own way, match customer question to the list of questions and then serve the prepared answer for the matched question. This process enables customers to find answers quickly instead of having to go through a long list of questions.

Single-vs-Multi turn — In this model, the customer query could be answered immediately within a single turn if it is a simple query. On the other hand, the chatbot may need to ask a few questions to get more info from the user before answering the question.

Intent-vs-pattern recognition — the question being asked can be identified in many ways. Intent classification is a popular approach. Here, the list of questions for which we know the answers are labelled with intent names (i.e. what is the user intending to say/ask). Each intent is then given a number of example variations of the same question. They are then fed into a machine learning algorithm that learns to classify a new unseen question from the user as one of the intents. Once intent is identified, the answer can be served.

An image of a question mark — Photo by Jon Tyson on Unsplash

The other approach is one that has existed since the time chatbots were born (e.g. Eliza). User utterance is pattern matched with pre-defined patterns and pre-defined answers/responses are served. Several tools are available in the market to implement pattern based conversation management (e.g. Pandorabots).

Recent advances in deep learning can also be used to build seq2seq models which take a sequence of words as input and output another sequence of words. This approach can be used to build interactive single-turn FAQ models. This model of conversational AI can be used for use-cases like FAQ, troubleshooting, small talk, etc.

Form-filling

Form-filling, as the name says, is a model of conversation that involves filling in a form. A user request is mapped to an intent or a pattern that triggers a form that needs to be filled and in order to do so, the chatbot will have to ask a number of questions. Once filled the form can then be used to either do a database search or a database update.

Take a travel agent chatbot, for instance. It will ask a series of questions to fill in fields like source, destination, date of travel, etc to do a database search for flights. Once you choose a flight, the details of the flight will be added to a larger form to make a booking (i.e. database update). Both search and update needed information that were gathered by asking questions driven by the form. However, the downside is that intents need to be created and the conversation needs to defined meticulously every step of the way to fill in the form, submit and handle the database results.

Form-filling and FAQ models are currently the most popular as these take care of the most mundane repeated conversations customers tend to engage in. Platforms like IBM Watson, DialogFlow, etc provide tools to handle these models.

Question Answering

Open domain question answering has been a sub-field of Natural Language Processing research with the objective of understanding user questions in natural language and extracting answers from a large corpus of text. This as you can clearly see, is a way of reducing the human effort in curating answers to questions that customers ask. It may be nearly impossible to create an exhaustive list of prepared questions and answers. To address this problem, chatbots should use QA models that can extract answers from large corpus of text on the fly.

QA model for conversation can be used where there is a large body of text that customers could query from and creating intent and curated answers for each question-answer pair is an expensive proposition.

Recent advances in transformer based models like BERT, GPT-3 have made robust QA models for conversational AI possible. The following is an example of QA model (by DeepPavlov.ai toolkit) in action.

A snapshot of DeepPavlov toolkit — showing a question answering demo. — DeepPavlov — TextQA demo

NL Database Interfaces

The third type of conversational model is one where the user utterance can directly be mapped on to a database query. For instance, let us assume a relational database containing information about customer transactions data. To let customers interact with this database using natural language, form-filling model can be used. However, there are many ways to query a relational database and using form-filling model, you may have to design many conversational forms to fulfill your customer needs. Instead if you can translate your customer requests in natural language to a database query, you can run the query and respond appropriately without the need for creating forms and intents.

A snapshot of turning a Natural Language query into SQL — Translating NL query into SQL

Query language — Depending on the type of database, the target query language will vary. For instance, for relational databases, NL queries may need to be translated into SQL. For graph databases like Neo4J and RDF triple stores, they may need to be translated into Cypher and SPARQL.

How? — There are deep learning approaches — Seq2Seq models — that can translate from NL queries into a query language. Recently, GPT-3, the largest pre-trained language models so far, has been used to translate NL to SQL query using few-shot learning.

This model allows the customer to create a number of queries about the data in natural language without constraining them to pre-defined forms.

Dialogue Planning

The final model in my list is Dialogue Planning. This model uses AI Planning approach to drive conversation. AI Planning is an Artificial Intelligence approach to intelligent problem solving. In a dialogue planning model, we will treat conversation as a planning problem with an initial state and a final goal state. The AI planner’s task is then to find an optimal sequence of steps from the initial to the goal state. In a conversation, these steps will include — asking the customer for answers to specific questions, fetching or updating info from/to a back-end system, etc.

For instance, to book a flight ticket, the agent will come up with a plan to ask a series of questions — destination, date, etc, search for flights, summarise them, help user to choose one, ask further questions — passenger name, age, meals, etc, make a booking and send a confirmation email. While in a form-filling model, the above sequence will have to be authored by hand, in a planning model, only a set of actions will need to be provided. The agent could use the same set of actions to create another sequence to achieve a different goal. To come up with an analogy, it is like the agent is a given a number of LEGO bricks that it can put together in various ways to build different things.

An image of LEGO bricks — Photo by Xavi Cabrera on Unsplash

Like NL Database Interfaces and QA models, it allows for users to define initial and final states using natural language without being constrained by pre-defined conversational pathways. Instead, using AI planning, new pathways are created using a library of planning operators (or dialogue actions). Dialogue planning is still largely an area of research and non-availability of toolkits makes it hard to implement this model in a production environment.

Furthermore, planning approaches can be combined with deep reinforcement learning to optimize generated plans based on experience and reward from the environment. This will turn them into learning agents as well.

Hybrid Assistants

Truly intelligent conversational agents will need to combine above models in a meaningful way. Such an assistant will be a hybrid with skills to combine various conversational models based on needs of the customer, relative success and cost of each model competing to solve the same problem. Combining these approaches will come with its own set of problems — need for unified knowledge representation mechanisms, explainability and control, etc. But with problems, solutions will come too.

While FAQ and form-filling models are particularly popular now, the need for models like Open QA, NL database interfaces and Dialogue planning are becoming more prominent as not every conversational pathway can be pre-determined, planned and scripted by human content developers. Developments in NLP and machine/deep learning over recent years — transformers like BERT, GPT-3, T5, reinforcement learning like AlphaGo, etc — show promising traits and I believe, will help us achieve our goal to build truly intelligent conversational AI.

Hope you enjoyed this write up. Please do share your comments.