Can Conversational AI ask “Just one more question…” like Lt. Columbo?

Antonio Origlia
URBAN/ECO Research
Published in
7 min readApr 25, 2024

Conversational AI, at present, is a lot about answering users questions. There is a significant amount of research working on generating content that satisfies user requests but what about the capability of the machine to ask smart questions? Sure, a Large Language Model (LLM) will ask you what is your favourite movie genre if you tell it you would like a movie recommendation but, as discussed in our post about Raison d’exprimer, why does it do that? Is it because it is trying to establish common ground and take a stance on that basis or is it merely mimicking what people in the training set have been doing when asked to do the same thing? Parroting causes much of the problems LLMs have when hallucinating or becoming inconsistent with previous exchanges after a few dialogue turns. Therefore, the ability to ask the right questions aimed at accomplishing something is as important as taking stances to have Conversational AI behave in explainable ways.

A quote from Carl Jung highlighting the importance of the ability to ask questions

As discussed in the previous post in this series, questions are often used to improve communication quality by clarifying. In this post, we will describe the first pieces of a Hybrid Conversational AI architecture based on the Framework for Advanced Natural Tools and Applications with Social Interactive Agents (FANTASIA) [Origlia et al., 2019; Origlia et al., 2022], a plugin for the Unreal Engine designed to support the development of Virtual Humans with Conversational AI capabilities. A FANTASIA Conversational AI system is composed of different AI tools working together to implement linguistically motivated dialogue management strategies. Such system follows these main principles:

Behaviour Trees (BT) [Flórez-Puga et al., 2009] are used to organise and prioritise dialogue moves; Graph Databases (i.e., Neo4j [Webber, 2012]) are used for knowledge representation and dialogue state tracking; Probabilistic Graphical Models, implemented using the aGRuM library [Ducamp et al., 2020], are used for decision making; Large Language Models are used to verbalise the decisions taken by PGMs.

In a FANTASIA Conversational AI architecture, in general, dialogue is represented as a sort of physical system, represented as a graph, that the participants intervene on, using dialogical moves, to push it towards a desirable, stable state. Graph configurations, representing the Personal Common Ground, are, then, checked following a linguistically motivated priority order to decide the best utterance to produce to reach the desired configuration. Graph configuration types can be summarised as follows:

  • a graph is uninterpretable if any of the graph patterns describing communication problems is activated. In these cases, a clarification request is produced;
  • an interpretable graph is incomplete if information needed to respond to the user is missing. In these cases, a request for information is produced;
  • a complete graph is incoherent if logical conflicts are found in the belief graph, the set of facts the system believes are established. In these cases, the adequate disambiguation question is produced;
  • a coherent graph is unstable if there are open issues, like unanswered questions. In these cases, a question answering strategy is activated;
  • a stable graph is undesirable if it does not exhibit the goal pattern. In these cases, the most useful dialogue move to create the goal pattern is produced.

In this post, we will summarize how a FANTASIA system is designed to handle problems from uninterpretability to incoherence. Further details will be provided in future posts, as we also continue presenting the linguistic background motivating our choices.

Interpretability

As discussed in the previous post, multiple problems can happen during communication. These range from corrupted signals to unknown named entities etc… In the movie “Back to the future”, for example, “Doc” Brown and Marty McFly cannot communicate because of the noise caused by the storm.

Doc cannot hear Marty shouting its warning during the storm in “Back to the Future”

Linguistic theory provides us with a hierarchy of problems that must be handled following a priority order. From a technological point of view, prioritization mechanisms in real time systems are efficiently handled using BTs, like in computer games. Summarising, a BT evaluates nodes from left to right, activating meta-nodes or tasks. Both can either succeed or fail and they report their result to the parent. Tasks execute actions, while meta-nodes compose complex operations with multiple sub-tasks. A Sequence meta-node fails as soon as one of its children fails while a Selector meta-node succeeds as soon as one of its children succeeds. Nodes in a BT can also be equipped with Decorators, enabling the execution of the node they are put on. Decorators can also act as Observers, interrupting the execution of lower priority nodes as soon as the condition monitored by the Decorator becomes true. BTs can represent the linguistic hierarchy presented in the previous post in a flexible way, so that it can be updated while we keep conducting our research.

Intrepretability problems can be handled by the BT shown in the following Figure.

A BT representing how Clarification requests are handled in a FANTASIA architecture. Following the reference theoretical principles, the graph is checked for problems and it returns as soon as one of these is detected, selecting the appropriate Clarification Request as a Task node output. Green nodes represent Tasks, cyan nodes are meta-nodes and purple nodes are compact sub-trees (expanded in the rest of this Medium series). The red block represents a Decorator.

Completeness

When someone is able to interpret what has been said, they may need more information to actually complete understanding. This is the case, for example, of a waiter asking for details about an order. In “The Blues Brothers”, Aretha Franklin asks what Jake and Elwood would like to eat and drink, also double checking Elwood’s order, consisting of toasted white bread only.

Aretha Franklin taking Jake and Elwood’s order in “The Blues Brothers”

When information is missing, this may still be inferred from previous exchanges. Dialogue State Tracking strategies, using graphs, can be implemented as path search queries leading from an unfilled slot to a compatible candidate found by following the dialogue acts chain backwards in the dialogue history. If this is not possible, a clarification request is generated. As completeness is, together with coherence, a problem categorized as Information Processing, its BT is shown in the following paragraph.

Coherence

When information collected from the interacting source is complete, it may still clash with previous beliefs the listener had. For example, Lt. Columbo identifies lies by checking for inconsistencies in statements from the involved people through targeted questions.

In the episode “A Deadly State of Mind”, Columbo asks questions about inconsistencies in the case he is investigating

The specific case of Clarification Requests for Information Processing also highlights the importance of linguistic studies in building an efficient Conversational AI system: the previous post described studies that have shown that specific kind of incoherence problems call for the use of specific syntactic forms (e.g. a high negative polar question), which should inform the generation of the answer to maximize the effectiveness of communication. Moreover, as we solve problems of lower complexity, we move towards behaviours that are gradually more and more goal-directed, therefore requiring stronger degrees of illocutionary force. These escape the associative nature of neural networks, as discussed in another post.

From a technological point of view, we can simulate this process, in FANTASIA, by exploting the transactional nature of the Neo4j database. When an utterance is interpreted, it may imply the creation of a new belief in the graph structure representing the dialogue state, as shown in the following example.

Utterance processing updates the dialogue state in the Neo4j graph database and creates new beliefs after the interpretation

New beliefs, however, are created as part of a transaction that is not immediately committed. A subsequent query runs to check for inconsistencies like, for example, a positive and a negative belief about the same subject/predicate existing after the update. If a clash is detected, a Clarification Request (using the appropriate syntactic form) is generated and the changes in the belief graph are rolled back. The BT implementing this mechanism is shown in the following Figure.

The BT managing Information Processing problems

In this first part of a FANTASIA architecture for Conversational AI, we have discussed how to link linguistic theories to technological design choices to build an interpretable dialogue management strategy that informs both researchers working on computer science and researchers working on humanities studies. Such an architecture represents a technological interpretation of linguistic concepts that can be easily inspected at runtime, supporting informed research and responsible AI.

References

[Ducamp et al., 2020] Ducamp, Gaspard, Christophe Gonzales, and Pierre-Henri Wuillemin. “aGrUM/pyAgrum: a toolbox to build models and algorithms for Probabilistic Graphical Models in Python.” International Conference on Probabilistic Graphical Models. PMLR, 2020.

[Flórez-Puga et al., 2009] Flórez-Puga, Gonzalo, et al. “Query-enabled behavior trees.” IEEE Transactions on Computational Intelligence and AI in Games 1.4 (2009): 298–308.

[Origlia et al., 2019] Origlia, Antonio, et al. “FANTASIA: a framework for advanced natural tools and applications in social, interactive approaches.” Multimedia Tools and Applications 78 (2019): 13613–13648.

[Origlia et al., 2022] Origlia, Antonio, et al. “Developing embodied conversational agents in the unreal engine: the FANTASIA Plugin.” Proceedings of the 30th ACM International Conference on Multimedia. 2022.

[Webber, 2012] Webber, Jim. “A programmatic introduction to Neo4j.” Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity. 2012.

--

--

Antonio Origlia
URBAN/ECO Research

I am a researcher in the Human-Computer Interaction field and work on developing Dialogue Systems for Embodied Conversational Agents.