When AI answers “42” to our Ultimate Questions

Published in

URBAN/ECO Research

5 min readMay 10, 2024

In the previous post we have learned that answering questions is a difficult problem, not only because of the need to provide the actual answer, but also to make the answer itself satisfying from a conversational point of view. In “A Hitchhiker’s Guide to Galaxy”, the supercomputer Deep Thought (and Deep Learning was not a thing when the book was written!) is instructed to provide an answer to the unspecified Ultimate Question of Life, the Universe and Everything. After thinking for 7.5 million years, it provides the answer. It even specifies that its creators are not going to like it and, in the end, it simply states… 42. Of course, the entire population becomes disappointed and even angry at the answer, which is obviously useless to them. While the question was indeed even harder to compute than the answer (and we will get back to the problem of choosing what to ask in a future post) unsatisfying answers can be a problem when Conversational AI is concerned.

The supercomputer Deep Thought from “A Hitchhiker’s Guide to Galaxy”, after thinking for 7.5 million years, states that the answer to the Ultimate Question of Life, The Universe, and Everything is… 42

This is a problem that even OpenAI appears to be aware of about ChatGPT. At present time, a campaign targeted at understanding what the best question length and style appears to be going on. Consider the example reported in the following Figure. Two different possible answers are provided and the user is requested to select the most appropriate one.

An interaction with ChatGPT4. OpenAI appears to be investigating the perceived quality of the provided answers by using different lenghts and styles

Candidate answers vary in length, style, and content but, in both cases, ChatGPT appears to be trying to get out of the dialogue and cut it short after a general question about user preferences. We can easily agree that this is not really how a human-human conversation would go (see for example conversations recorded in the Inspired corpus [Hayati et al., 2020]). The virtual agent does not seem to have an actual interest in the “conversation” and both its answers may be perceived as uninteresting or disappointing because of the lack of interaction and personalisation.

There are, then, many ways of answering a question but what is the most appropriate? Well, given what we have learned from the previous post about Grice Maxims and Relevance Theory, it depends. In different situations, a more lengthy answer may be better than a short one or vice-versa. How do we deal with this? On one hand, we can collect data and train GenAI models to provide answers using length and style that most probably would fit the context but is this even possible? As discussed in other posts, in this series, the communicative intention is also important to decide how to structure an answer. One may choose to provide more or less details in their answer depending on how important the information is believed to be to create a personally satisfactory common ground. Satisfaction will then come, to the interlocutor, in recognising the speaker’s personal involvement in the conversation, independently of this being agreeable. So, depending on the question and on the speaker’s interests, we may have different lenghts, different styles and different amounts of circumstantial context in the answer. Questions may, then, range from requests to provide simple data (dates, names, places…) more complex information (movie plots, people’s biographical information…) or explanations for exhibited behaviours.

A Dalek from the Doctor Who TV show angrily asking for explanations

At present time, researchers are very excited about the possibilities provided by Retrieval Augmented Generation (RAG), and with good reason. Selecting relevant contexts from which to derive an answer can have a significant impact in controlling hallucinations and made up contents. However, RAG alone cannot solve the problem and, as with other tools at our disposal to develop AI, it should fit into a bigger picture as one of the resources to use when building Conversational AI. In the FANTASIA Interaction model, we locate RAG in between strategies designed to build answers from structured data, which can be reliably be extracted from a set of known facts, and strategies dedicated to explaining the AI’s behaviour. RAG, then, is concerned with building answers from unstructured data only, when it is not possible to use more reliable, structured information and the answer must be estimated from textual content. On the other hand, there are also questions that cannot be answered using a set of structured or unstructured information because they concern the interlocutor themselves. A question like “Why would you say that?” calls for the machine to reflect of itself, on its goals and action plan and to make it transparent to the user what its intentions were in executing an action, like speaking. This point, in particular, makes it clear why it is not enough for a GenAI approach to provide an answer to this question. Any generated content would represent what most people would say given the context and the question but it does not represent any kind of real internal state of the machine characterised by intentional or at least goal-oriented behaviour. The priority order for using structured, unstructured and self-reflection data to provide an answer is represented by the Behaviour Tree solving Instability problems (see the previous post) in the Dialogue State Tracking graph. After committing changes in the belief graph when verifying that no inconsistencies are present and acquiring the obtained information, questions may be answered in different way depending on the question itself, on the system goals, in providing an answer and on the technique used to generate the answer.

The Behaviour Tree managing Instability problems (Question Answering, mainly). After committing changes in the belief graph, the system checks for and handles questions that can be answered using structured data. If this is not possible, it attempts to answer using RAG or, in the end, uses self-reflection to disclose its internal state and reason to act. If no Instability situations are detected, the dialogue graph is considered stable and the next set of problems is handled.

Instability problems present an increasing amount of personal interest in the way the system needs to provide an answer. With respect to the previous situations, dealing with incomprehension, communication is now happening correctly. More advanced dialogue management strategies must be employed, then, as the system attempts to reach its goals by producing speech acts designed to alter the configuration of the dialogue graph. In the following posts, this topic will become more and more important as we deal with speech acts that should be more and more characterised by intentional behaviour.

References

[Hayati et al., 2020] Hayati, Shirley Anugrah, et al. “Inspired: Toward sociable recommendation dialog systems.” In Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing (2020). pp. 8142–8152

When AI answers “42” to our Ultimate Questions

Written by Antonio Origlia