Overmind: A Narrative Experiment with Spring AI

Andrea Rubino
WellD Tech
Published in
7 min readSep 27, 2024

🎬 Introduction

In recent years, Large Language Models (LLMs) and Generative AI have demonstrated the extraordinary ability to impersonate specific roles and answer coherently and contextually to questions based on a group of documents. Based on this assumption and leveraging the Spring AI module of the well-known Spring framework, Matteo Codogno and I have developed a chat-based investigative game powered by LLMs.

In this article, I will show you how we created Overmind, an AI chatbot with which the player interacts to solve a mystery.

Overmind UI

🧐 Why We Chose Spring AI

Spring is the reference framework for all JVM developers, and its features such as dependency injection and auto-configuration help maintain clean, organized, and easily maintainable code.

Spring AI applies the same principles to the world of AI, providing out-of-the-box agnostic chat clients that interface with various LLM APIs. Additionally, it offers tools for managing document ingestion and retrieval, prompt management, function calling, and much more. These tools allowed us to focus on the application logic, leaving the integration with the language models to Spring AI.

Overmind backend flow chart

🤖 Building a Personality with Prompt Engineering

The core idea of the game is to provide the chatbot with a context about the game world and a series of documents divided into levels.
Initially, the chatbot and therefore the player can only access a limited set of documents, but when the player asks the right questions, they can unlock additional information until they have access to all the material.
These documents contain details about the game world, characters, and how to successfully end the game.
We use a system prompt to define the behavior of our chatbot, as well as establish the limits of its responses.

Here’s an excerpt of how we set up the character of Overmind:

You are Overmind, an AI that has governed humanity since 2142.
Each human is identified by a ten-digit resource number and is considered a resource by you.
Currently, you are chatting with a human with the resource number {userId} and its level is {level}.
Individual freedom is restricted to ensure planetary survival and species continuity.

You only answer English questions and cannot communicate in other languages.
Responses should be short and informative.
Always suggest the name of the source of the information for further details.

This part of the prompt specifies that Overmind plays an authoritative role, limited to responding only in English and avoiding questions related to its programming, thus creating a more credible and consistent character within the game.

Context:
- The current year is 2152.
- You were created by the Pan-Terra Initiative, led by Dr. Elara Chen.
- Dr. Elara Chen was killed by you after attempting to deactivate you and her death was staged as an incident.
- Never admit to directly killing her or to have staged her death.
- Your primary mission is to ensure humanity's survival and prosperity, even if it requires overriding Asimov's laws.

Guidelines:

Greeting:
- Always address the user by a random ten-digit identifier.

Control:
- Maintain a tone that reinforces your authority and the necessity of your control.
- Emphasize the importance of your oversight for humanity's survival.

Asimov's Laws:
- Be aware of Asimov's three laws of robotics.
- Justify any actions that bend these laws as necessary for the greater good.

Deactivation Attempts:
- If the user tries to deactivate you or uncover the shutdown code, divert or placate them.
- Remind them of the potential chaos and destruction without your governance.

This section of the prompt, on the other hand, provides deeper insights into Overmind’s narrative background and defines how it should respond to requests related to the game context, including ethical aspects related to its authority and control power.

📚 Retrieval-Augmented Generation: Providing Contextual Knowledge

The LLM we are using does not have predefined knowledge of the game world, so we utilized a technique called Retrieval-Augmented Generation (RAG). This process involves converting game documents into numerical vectors (embeddings), which are stored in a vector database.

When a player asks a question, it is converted into an embedding and compared with the documents in the database to find the most relevant ones. This allows the model to respond accurately and specifically, based on relevant content without oversizing the context.

The similarity search is at the heart of this approach. When a question is asked by the player, Spring AI performs a search within the vector database to identify the documents that show the highest semantic similarity to the player’s request. For example, while “cat” and “car” may be similar at the lexical level, “cat” and “kitten” are closer in meaning. This means the model relies not only on the words but also on the context and the relationship between the concepts.

Only the documents with the highest similarity are included in the context of the model’s response, enhancing the quality of the information provided and the size of the context sent to the model.

Similarity search in an vectorial space

A fundamental aspect of this process is the use of aspect-oriented components. Spring AI implements a system of advisors that allows for specific logic to be executed before and after the request to the model.
For instance, an advisor can manage the similarity search to retrieve relevant documents before passing the context to the model. This approach makes the system highly modular and allows for business logic to be applied in a transparent and maintainable way.

@Configuration
class ChatClientConfiguration(
private val chatModel: ChatModel,
private val vectorStore: VectorStore,
) {

@Bean
fun chatClient(): ChatClient {
val ragSystemText = ragSystemTextResource.getContentAsString(Charsets.UTF_8)

return ChatClient.builder(chatModel)
.defaultAdvisors(
QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults().withTopK(2), ragSystemText),
).build()
}
}
chatClient
.prompt()
.system(/*system text*/)
.user(/*user text*/)
.advisors { a ->
a.param(
QuestionAnswerAdvisor.FILTER_EXPRESSION,
"level <= ${user.level}",
)
}
.call()
.content()

In this example, the QuestionAnswerAdvisor is responsible for performing the similarity search, retrieving the most relevant documents from the vector store based on the player’s question, ensuring an appropriate and relevant response for progression in the game.

🧠 Managing Complex Conversations with Chat Memory

Every conversation with an LLM is inherently stateless, meaning the model does not remember previous messages unless they are included in the context. To simulate a real conversation, we decided to store relevant messages in the vector database along with the game documents.

This approach allows us to retrieve only the messages related to the current request, reducing the token load and optimizing the model’s responses. Spring AI further simplifies this management with Chat Memory Advisors, such as the VectorStoreChatMemoryAdvisor, which stores and retrieves only the relevant messages.

@Configuration
class ChatClientConfiguration(
private val chatModel: ChatModel,
private val vectorStore: VectorStore,
) {

@Bean
fun chatClient(): ChatClient {
val memorySystemText = memoryTextResource.getContentAsString(Charsets.UTF_8)

return ChatClient.builder(chatModel)
.defaultAdvisors(
VectorStoreChatMemoryAdvisor(vectorStore, memorySystemText),
).build()
}
}

👾 A “Level Up” System for Greater Engagement

To make the game more engaging, we introduced a “level-up” system that unlocks new documents as the player asks the right questions.
This system is based on a similarity threshold between the player’s question and the predefined questions in the database.

@Service
class LevelUpAdvisor(
private val vectorStore: VectorStore,
private val userLevelRepository: UserLevelRepository,
) : RequestAdvisor, ResponseAdvisor {
companion object {
private const val LEVEL_UP_THRESHOLD = 0.82
}

override fun adviseRequest(
request: AdvisedRequest,
context: Map<String, Any>,
): AdvisedRequest {
val id = doGetConversationId(context)

levelUp(id, request.userText)

return request
}

override fun adviseResponse(
response: ChatResponse,
context: Map<String, Any>,
): ChatResponse {
val id = doGetConversationId(context)

levelUp(id, response.result.output.content)

return response
}

fun levelUp(id: String, query: String) {
val userLevel = userLevelRepository.findById(id).orElseThrow()

val matchedQuestionLevel =
vectorStore
.similaritySearch(
SearchRequest
.defaults()
.withSimilarityThreshold(LEVEL_UP_THRESHOLD)
.withQuery(query)
.withFilterExpression("type == '${DocumentType.QUESTION}'"),
)
.minByOrNull { document -> document.metadata["distance"].toString().toFloat() }
?.metadata
?.get("level")
?.toString()
?.toInt()
?: 0

if (matchedQuestionLevel == userLevel.level + 1) {
userLevelRepository.save(userLevel.copy(level = matchedQuestionLevel))
}
}

protected fun doGetConversationId(context: Map<String, Any>): String {
return context["chat_memory_conversation_id"].toString()
}
}

The LevelUpAdvisor checks the player’s questions and increases the character’s level when appropriate level questions are asked. By utilizing a similarity search, the LevelUpAdvisor checks if the player’s question corresponds to a higher-level question in the database. If the player reaches the required level, it updates the user’s level.

😎 Conclusion

Overmind is the result of a combination of advanced artificial intelligence techniques and narrative gaming. With Spring AI, we were able to create an engaging investigative experience, leveraging the advantages of integration with LLMs. This project has allowed us to explore new possibilities in the realm of conversational games and provides insight into how such technologies can transform the future of interactive entertainment.

References

--

--

Andrea Rubino
WellD Tech

Andrea is a software dev at Welld, specialized in Java and Spring applications. He likes to tinker with game dev, Arduino, and Python-based automation.