Eudexia — An LLM-based AI Assistant

5 min read6 days ago

By: Surbhi Bhutani, Matthew Reed, Monisha Gnanaprakasam, Kishor Sundarraj, Harsh Patel, Ankita Patil, Anaghaa Londhe, Aakash Agrawal, Members of Scientific Staff, DISH Wireless

Information is difficult to maintain and keep accessible, and you have to bring the intelligence yourself.

Everything changes if we have complete and intelligent access to our information. But the reality is that our personal data is not transparent. It piles up. It takes many forms and structures, and gets stored in many places. Even when information is known to exist, it is often not known where it exists.

Often, it falls to knowledgeable individuals, working together, to dredge up and synthesize this information into useful forms for the inquiring party. This process is not only unscalable, but expensive in both time and capital. And sometimes, we know what questions we have, but have no idea what information, if any, could help us.

Background

LLMs give us incredible advantage and flexibility; they are even more powerful when they have access to relevant proprietary knowledge.

Over the past year, Large Language Models (LLMs) have provided unprecedented flexibility and capability in the natural language processing and understanding space. These models, trained on enormous and highly varied oceans of data, have gained surprising semantic powers, enabling sophisticated reasoning capacities over a previously unimaginable range of domains. When provided with new concepts, they can effectively synthesize and reason about the context they have been given, providing tailored and conceptually relevant responses. By coupling a powerful information retrieval system to the LLM, teams can create continuously updating synthetic advisors and subject matter experts, without having to train or update the model itself.

Use Case

Recent AI developments (LLMs, Embeddings, and VectorDBs) have suddenly provided the ability to unlock the information contained within our documents, and can be custom fit to you and your needs.

As news to no one, 2023 was a Cambrian explosion event for AI. Suddenly, a mass of new techniques and models were widely available, with dynamic and powerful capabilities.

By stringing together the semantic power of embedding models, the lightning fast information retrieval of Vector Databases, and Large Language Models as reasoning engines, the dream of transparent and frictionless information access is becoming real.

Securely and scalably interact with your private data using plain language.

To address relevant information retrieval by new members onboarding, our team built Eudexia (Eunoia — Beautiful thinking, good will; Index — Organized and accessible information) which is an AI-based system for making private data accessible for exploration and utilization in a frictionless, plain language interface. It is highly configurable, and designed for enterprise use, providing secure and scalable insights in restricted or niche knowledge bases.

The primary objective of the project is to empower users to interact with their private data using plain language. The AI system allows users to ask questions, present hypotheticals, and request analysis as they would to a colleague, and in turn receive a detailed and tailored response, grounded in the relevant documentation, with accompanying sources from the documentation. This unlocks the knowledge captured in an organization’s documentation, and opens it to novel insights and application, in a highly available and differentiated manner.

Results

We have demonstrated the ability to ask a question to our system, and have an intelligent response generated based on the context of your corpus, with accompanying sources. All of this done securely, on our own private network.

Our first iteration of Eudexia, allowed users to enter a query into a web page, and receive back a thoughtful, relevant response (with accompanying source links) based on the domain specific documentation it has access to. It is able to infer intent, understand the contextual implications of the question, and structure not only an effective information search across the corpus, but synthesize the results into a useful conclusion. All of the technical processes are handled behind the scenes; the interaction between the user and the system is all in plain language.

In ~15 seconds, our team members are able to parse through thousands of confluence documents and drill down to the content that matters for their specific question. What would potentially be a 30 minute process of finding the right person, discussing the subject, and then finding the relevant information, is done in less than a minute by a single person.

Methodology

Used the cloud (AWS) to provide secure and scalable infrastructure; provided a system for integrated large scale corpus ingestion, information retrieval, question answering, and a convenient user interface.

The team deployed the project into AWS to provide high availability, scalability, and security.

The system can ingest a corpus of arbitrary size by automatically scaling lambda function calls, breaks the documents into chunks, and indexes the chunked data semantically through embeddings in a vector database.
User queries from the UI are enriched with relevant context using an LLM, and then used for similarity search within the vector database.
Relevant text chunks retrieved from the vector database are then passed with the user query to an LLM, which reasons about the question given the context, and provides a response (with sources) to the user at the UI.

Fig: High-level RAG Architecture for Eudexia

Our PoC demonstrated the capability of this type of system to effectively retrieve and reason about our team’s private data, even in the first iteration of the architecture. The data ingestion pipeline is highly scalable, capable of targeting any arbitrary collection of html documents in S3 storage, and the prompts are highly configurable, providing wide ranging control over contextual understanding and style of output.

Conclusion

For our RAG (Retrieval Augmented Generation) implementation, there were several learnings along the way. Major challenges which we discovered were hallucination, limitation on context windows, and overall system latency (from a question being asked by a user to the response delivered). To handle these we tried multiple approaches such as context injection using prompt engineering, hybrid search, and architecture considerations (components used such as VectorDB, Embedding models, LLM and orchestration) as the user experience highly depended on it.

The project successfully demonstrated the fundamental capabilities of this class of system, and provided the architectural underpinnings for similar AI-based systems to follow. We have covered the details of our implementation and next steps in the article: Discovering Insights: Exploring Information Retrieval in a RAG System.