Interaction Logic #3: Schema-Driven Approach

Sean Wu
OpenCUI
Published in
8 min readApr 4, 2023

Building conversational experience using LLMs is easy, but the solution is not dependable as we want. What do we do? The flow-based approach, widely used in building user interface applications, models both user input and system response as turn-by-turn chains. Defining the system’s response in the chain gives developers complete control over interactions, making it easier to achieve business objectives and provide an excellent user experience.

However, defining the system’s response in this way does not reveal why a particular response is necessary. When a current interaction is not covered by predefined flows, the system simply does not know what to do, leading to the infamous chatbot response, ‘I don’t understand’.

To prevent such a bad user experience, developers must carefully cover every possible interaction flow that a user can get into under this imperative setting. This is not too difficult for graphical user interface (GUI) applications since users can only interact with the system in predefined ways. However, since conversational user interfaces (CUIs) allow users to say anything at any turn, developers are forced to pick their poison: either attempt to enumerate an exponentially increasing number of conversational paths, leading to significant cost overruns, or risk providing a poor user experience by omitting some conversational paths.

So, are we doomed, or are there alternatives? To figure that out, let’s start with why businesses want to build chatbots in the first place.

Conversation is merely a means to expose service APIs

In order to provide consistent and reproducible service, most of the functionalities modern businesses provide to their users are digitized in form of APIs, regardless what frontend are used to expose them. Businesses build chatbots just so that they can expose their service APIs through conversational user interface.

This means that instead of worrying about the infinite paths that users can lead the conversation into, businesses only need to worry about the conversation paths that are related to the services they provide. For example, when faced with a question about the best barber shop in the neighborhood, a dental front-desk chatbot can choose to politely disengage or redirect the conversation back to dental-related topics. None of these choices will generally be construed as negative user experiences.

Furthermore, as soon as we detect the intent from the user, businesses, not users, are in the better position to steer the conversation toward delivering the service since they have dealt with such requests over and over again. Indeed, since users may not be aware of current business conditions, including inventory, the chatbot often needs to drive the conversation instead of passively responding to user input.

Furthermore, as soon as the user’s intention is known, businesses, not users, are in the better position to steer the conversation toward delivering the service. This is not only because businesses have dealt with such requests over and over before so they are experience, but also because they have access to current business conditions, including inventory, which can be crucial for effective communication.

Interaction logic is about slot filling

When a user specifies a service they want and provides all the information needed by the service in one shot, and their choices are serviceable, we simply invoke the target API function, render the result back to the user, and we are done. No interaction logic is needed.

Often, they will miss something. So after we pick up the user intent on what service they want, we need to design the conversation for the chatbot to gather the required information needed by the corresponding API, or in developer terms, to create a callable instance of that API function. This simply means we need to conduct conversations to recursively collect user preferences on slots, which include both the input parameters and their component attributes, for that API function. This process creates a value instance for each slot required, and thus is known as slot filling.

Clearly, up-to-date information from the production system can significantly improve slot filling efficiency. For example, when selling movie tickets, instead of asking the user for their preferred showtime, it is much better to provide them with a candidate list of showtimes that still have available seats. This way, we avoid scenarios where we repeatedly apologize for options they choose but we cannot fulfill.

The question is, how do we inject such information into the chatbot, design the interaction logic for it so that it can take advantage of business conditions and effectively steer conversations to help users? We already know flow-based approach does not work as it requires builder to specify exponentially many path.

Runtime with default behavior

Modern GUI applications are all developed using an event driven approach. In stead of writing a series of step-by-step instructions that fully defined user experience, in event-driven programming, developer only define events and their corresponding actions, the program responds to events as they happen based on developer defined event-action mapping.

In this event-driven approach, builders/developers only need to make decisions at a local level: determining the desired behavior if a particular event is triggered. With a smart runtime that provides good default behaviors for various events, it is possible for builders to only specify the behavior when they need to differentiate, which can reduce the effort on the builder’s side.

The runtime for conversational user interfaces (CUIs) needs to handle more issues due to the flexibility CUIs allow. It needs to maintain the context for more than one topic so that a user can engage in conversation on multiple topics. When the user finishes or abandons a detour topic, the runtime needs to automatically bring the conversation back to the original topic without the developer’s attention. We further assume that the runtime will always ask the user one question at any given time, in order to make it easy for the user to follow. Given a runtime with the goal of bring users the intended service as quickly as possible, we have the following schema-driven approach to define/build conversational user interfaces.

Define/Build CUI in 3 Layers

When the user does not provide their preference required for a slot, then designers or builders need to define the desired conversational behavior for the chatbot to gather such information. With the runtime defined above, we can define the CUI in three layers using a schema-driven approach:

  1. First, declare types for API and their parameters at the schema layer so that we know what type we need to create instances for via conversation.
  2. Then, attach dialog annotations onto these types at the interaction layer, so the chatbot knows what to do if the information is missing for a slot of that type, or when the information provided is not serviceable currently by business.
  3. Finally, complete dialog annotations at the language layer by adding templates for how semantics should be rendered in natural language and exemplars for how utterances should be converted to structured semantics.

Dialog annotations specify the expected behavior of the chatbot during its building instances for different types. Here are some example dialog annotations:

  • “Prompt” indicates how to prompt the user for their preferences on a slot.
  • “Value recommendation” provides a list of candidates to the user for them to choose one from. You need to specify how to get the list, what happens if the list is empty or has only one item, and what to do if there are more items than what you can fit in one turn, etc.
  • “Value check” defines what to do if the user’s initial preference is not serviceable, etc. Of course, you want to define how to check the value.

Taking movie ticketing as an example, the decisions you have to make at each layer are:

  1. At schema layer, skill ‘buyMovieTicket’ should have the following slots with their types. We assume the conversation happens in the logged-in session, so the user is known:
    movieTitle: MovieTitle, the title of movie.
    date: LocalDate, The date that the movie ticket for.
    showTime: LocaTime, The time that the movie is showing.
    format: The format the movie, e.g. IMAX, 3D, standard. etc.
  2. At interaction layer, you decide how to create an instance of each slot in a language-independent fashion.
    movieTitle: required, prompt, value recommendation, value check.
    date: if the user did not mention a particular date, assume today.
    showTime: required, prompt, value recommendation, value check.
    format: only prompt if the movie has more than one format available.
  3. At language layer, you define the template to render the dialog act into natural text and exemplar to showcase what semantics a natural language expression should be mapped into. For example, there is the prompt for each slot in English:
    movieTitle: ‘Which movie are you interested in?’
    showTime: ‘Great! Which showtime works best for you?’
    format: ‘Do you prefer IMAX or regular?’

Benefits, benefits and benefits

This schema-driven approach comes with these characteristics that make it cost-effective to create exceptional conversational experiences.

  1. Declarative: Dialog annotations declaratively specify the desired conversational behavior for creating an instance of a type. Once defined, these annotations are executed by the OpenCUI runtime. At any given turn, the chatbots identify the slot that is missing a value and then try to fill it by executing the corresponding dialog instructions, regardless of how the conversation was led to that point. This eliminates the need to enumerate all potential interaction paths in order to provide a good user experience.
  2. Reusable component: One of the cardinal sins in software development is building everything from scratch. Dialog-annotated types, also known as components, are naturally composable. This allows developers to construct complex systems by assembling reusable and interchangeable components, thus alleviating the burden of maintenance and reducing development time and effort.
  3. Implicit context management: One of the defining characteristics of natural language is that the same word can have different meanings in different contexts. On the OpenCUI platform, all annotations, including template and exemplar, are attached to type. Annotations only come into play when the OpenCUI runtime attempts to create an object to fill the corresponding slot in their hosting type. Therefore, both response rendering and language understanding are inherently context-dependent.

Parting words

Instead of trying to enumerate all possible conversation paths that users could visit, we propose to focus on the services that all the constructive conversations can lead to, since that how business provide value for their users. By defining actions for conditions under the necessary sense, builders can provide behavior specifications when needed and let the runtime do its thing. We hope the schema-driven OpenCUI can help you build the great conversational experience your users deserve, without trapping you in the implementation details.

References:

--

--