Dialog Management

Thomas Packer, Ph.D.
TP on CAI
Published in
8 min readNov 6, 2019

This story is a rough-draft. Check back later for the fully-polished story or post a comment telling me what you’d like me to research and write for you.

Dialog Management is one of the most important chatbot parts. It consists of efficient (and sometimes inefficient) data structures and algorithms that should enable chatbots to understand the high-level features of conversations and to participate in them — hopefully in a way that prevents it from running into dead-ends or constraining the user with rigid dialog rules. But as long as the purpose of the chatbot is achieved, it does not matter how simple or constrained the dialog management strategy is.

Photo by Esther DG on Unsplash

How can a chatbot reason about a conversation on a level of abstraction that allows it to take turns, accumulate conversation state in a meaningful way, and generally fulfill its purpose? First I define a few terms needed to understand dialog management. Then I list different styles of dialog management in order of increasing complexity and sophistication and decreasing level of control. These are roughly grouped into handcrafted and data-driven approaches according to the taxonomy in this survey. For each one, I list its pros and cons.

Contents

Understanding Dialogs

A dialog ideally consists of a sequence of utterances. Utterances alternate among the two or more dialog participants. During each turn, a participant typically utters a sentence, but it could be as short as a single word or as long as multiple sentences. Turns ideally do not overlap, but sometimes they do.

Dialog Management Terms

Acknowledgment: a speech act used to express the speaker’s attitude regarding the hearer with respect to some social action (apologizing, greeting, thanking, accepting an acknowledgment).

Clarifying questions are questions to bot asks that fill in an empty slot in the template that belongs to the current goal.

Commissive: a speech act a speaker uses to commit himself to some future course of action (promising, planning, vowing, betting, opposing).

Constative: a speech act that commits the speaker to something being the case (answering, claiming, confirming, denying, disagreeing, stating). It could place a constraint on the speaker’s perception of the state of the world.

Directive: a request. A speech act used to attempt to get the addressee to do something (advising, asking, forbidding, inviting, ordering, requesting).

Endpointing or endpoint detection: detecting the end of a dialog turn of the other participant. This can be difficult because of noise and because people often pause in the middle of turns.

Grounding statements are a statement the bot makes that establish some part of the shared context firmly, such as the user’s intent.

Initiative: At any given moment in a conversation, one of the participants has initiative, meaning it is their turn to say something. Initiative also distinguishes different types of dialog agents. In “user initiative”, the user must take initiative during the entire dialog and drive it from start to finish. In “system initiative”, the dialog agent drives the conversation. In “mixed initiative”, both participants are capable of taking and relinquishing initiative as needed.

Speech acts or dialog acts are the utterances each participant in a dialog makes, framed as an action. Types of speech acts include constatives, directives, commissives, acknowledgments. The ability to discriminate among these types of speech acts is necessary to effectively conducting a conversation because each type has a specific relationship to other speech acts.

Turn: the time allotted to a dialog participant in which to construct and execute a speech act. Each participant has a chance to take a turn after the other participant has done so.

Switch Statement

Simple if-then rules. User always has initiative.

Pros:

  • Easy to implement and maintain.

Cons:

  • Not very engaging for the user. Stateless: no conversation context is preserved.

Finite State Machine

Possibly the most common dialog management framework, a dialog is constrained by a finite state machine. Can include decision tree structures. Each step or dialog turn cycle is often structured as a bot requesting and receiving information from the user and then following an edge in the FSA graph determined by the state accumulated by the bot up to the end of that dialog turn.

Pros:

  • Common and well-understood
  • Easy to understand and predict for humans, as long as the state machine is small enough
  • More flexible than Switch Statement
  • In theory, any conversation could be defined by a state machine

Cons:

  • Less flexible than other approaches below
  • Redundant steps if they do not remember information between conversations
  • The state machine may become overly complex when extending it to handle all situations even for a narrow domain

Examples:

Binary Decision Tree

This is a simple type of finite state machine dialog design. Strategies for making a simple yet natural and general chatbot:

  1. The user may start the conversation with a description of what he/she wants done.
  2. The bot takes control of the conversation by asking questions.
  3. The high-level plan of the dialog is a series of yes-no questions used to traverse a simple binary decision tree.
  4. Each question is a simple yes-no question, so the language understanding required of the bot is minimal.
  5. The language generation can also be simplified by automatically generating questions from templates and slot-fillers.

This approach has built-in redundancy, meaning the same information might be obtained from either the user’s initial request and the user’s answers to yes-no questions. This provides training data for the chatbot to learn how to interpret the initial request better over time. Questions that the chatbot becomes confident that it can answer from the user’s initial request can be skipped during the decision tree traversal, making the dialog shorter and more efficient over time.

Variations of this approach may allow for the chatbot to ask questions with answers drawn from an unambiguous, finite vocabulary. If there are ambiguities in the answer, these can be resolved by additional yes-no questions.

Slot-Filling or Form-Filling

This is a common short-cut applied to finite state machine approach. The chatbot developer can specify a set of slots that need to be filled, and the chatbot will automatically generate enough of a dialog plan to ask the user questions and fill in those slots. This approach is usually not used by itself, but it is a step toward full AI Planning approach, discussed below.

Pros:

  • An easy way to introduce system initiative elements of a dialog.

Cons:

  • Not as complete as AI Planning

Goal-Oriented AI Planning

Whereas the above methods of dialog management allow and require the chatbot designer to explicitly write code (either in code form or in a visual UI flow chart) to account for every state and transition that might be needed inside a dialog, AI Planning is a big step toward freeing the chatbot developer from all of those burdensome details without the chatbot losing an understandable, declarative, and verifiable dialog plan. In this approach, the chatbot developer need only specify the features of a state, the kinds of actions the chatbot can take, and what a goal state looks like. Then the chatbot’s planning algorithm will generate the full dialog state machine. AI Planning can resolve the goal dependencies and the likely-most-efficient paths through the state machine given a non-deterministic set of potential user responses at each state.

The amount of work required of the chatbot designer is much smaller than with above approaches.

Other thoughts. The dialog plan might contain a DAG of goals and goal dependencies. Goal dependencies can be slots in a template containing parameters of an API query. So this is a superset of the form-filling approach. The chatbot should identify user’s intent early, set up a goal with dependencies/sub-goals to reach it. It should also allow the user to specify additional goals along the way which either replace the current goal, displace the current goal lower in a goal stack, or become a hint at a sub-goal in reaching the current goal.

Pros:

  • A natural and efficient way for the dialog agent to take more initiative in a conversation and guide the user more toward the common goal.
  • More efficient for chatbot designing to not need to write the entire dialog state machine by hand
  • More natural and effective at driving goal-oriented or task-oriented chatbot applications
  • Potentially more flexible and intelligent than above flowchart approaches as it follows more sophisticated principles of AI than a simple state machine
  • More precise and predictable than machine learning based dialog.

Cons:

  • There may not be any freely available tools that offer this kind of dialog management
  • Edge cases are difficult?

Examples (papers):

Machine Learning: Supervised Learning

The next action the bot takes is decided by a dialog model trained on dialog data using standard machine learning principles. For example, the bot could be trained to report the current weather any time the user asks a question that looks more like “What’s the weather?” than any of the other utterances the bot has been trained on.

Pros:

  • Leverages principles, algorithms, models, and best-practices of machine learning and data science, including modern NLP deep learning.
  • Flexible and less brittle than state machines.
  • Manually designing a complex state machine is eliminated or reduced to providing examples of reasonable dialogs.

Cons:

  • Can sometimes become “too flexible” for its own good. Hard to force the bot to take the right path in some cases.
  • Machine learning requires a lot of overhead in terms of training data, feature engineering, and model training if the dialog structure is simple. It can be easier to specify a state machine by hand.

Examples:

Machine Learning: Reinforcement Learning

In reinforcement learning, we model a dialog as either a Markov decision process (MDP) or partially-observable Markov decision process (POMDP). The dialog model must track current state probabilistically and determine each action using a policy that is learned from rewards and punishments and designed to maximize expected discounted cumulative reward.

Probabilistic Belief Oriented

Dialog allowing uncertainty using Markov-based models like Markov Decision Processes (MDP) or Partially Observable Markov Decision Processes (POMDP).

Pros:

  • Directly manages uncertainty in a dialog

Cons:

  • Complex
  • Academic for now

Resulting Chatbot Styles

  1. Interactive FAQ
  2. Form filling
  3. Question answering
  4. NL interface for databases
  5. Dialogue Planning

References

--

--

Thomas Packer, Ph.D.
TP on CAI

I do data science (QU, NLP, conversational AI). I write applicable-allegorical fiction. I draw pictures. I have a PhD in computer science and I love my family.