The uses, concerns and limitations of a State-Of-Engineering Chat Bot

Thibaud Banderet
ELCA IT
Published in
9 min readJul 30, 2024
Photo by Ant Rozetsky on Unsplash

As the hype around “Chat with your Data” applications grows and grows, companies are looking to hop on the ride, but what needs to be done to leverage this technology in a safe and useful manner?

More and more use cases for Retrieval-Augmented Generation (RAG) are appearing in all sorts of contexts: Assisting doctors for medical diagnosis, replacing ever-growing FAQs with a more interactive and user-friendly Chat Bot, Enhancing Large Language Models (LLMs) with up-to-date, dynamic information, the list goes on. The simplicity of the general RAG architecture makes it so generic that it becomes an interesting solution to all sorts of problems.

General RAG architecture, given a question, an LLM will retrieve information from one (or more) sources and generate a response that is grounded in the retrieved information

In order to evaluate this trend further, ELCA is investigating more unique and less clear candidates for RAG solutions, pushing its capabilities to its limits, exploring what can (and can’t) be done. ELCA has also been putting a strong emphasis on clearly defining which data can be used in a RAG model according to client contracts. For instance, if a project code belongs to the client, can it still be analyzed using a RAG model for knowledge transfer? This clarity is crucial for this technology to move forward within the industry.

This is where my internship: “State-Of-Engineering Chat Bot” comes in. On one side, it aims to showcase a unique use case for RAG by answering high-level questions instead of sticking to paraphrasing a source, on the other side it needs to push through and bring up legal challenges about using code that belongs to clients in such a solution.

Defining State-Of-Engineering as a use case

Let’s start with “State-Of-Engineering”, what does it entail exactly? Well, even that is not set in stone! As an intern, there is not much I know about use cases for a Chat Bot within ELCA, let alone linked to State-Of-Engineering. So, in order to get a clearer definition, I asked architects within ELCA what questions they really wish they could know the answer to. Since architects are the people overseeing the technical point of view of projects, and as my internship idea came from them, they appeared to be the best people to ask. From this survey I got to better understand what architects do in projects and how my Chat Bot can support their workflow by helping them learn about the entire ELCA projects ecosystem.

Based on the survey, State-Of-Engineering implies technical project information, that means that it will include questions such as :

  • Which dependencies do projects use most?
  • Which versions are most common for this dependency?
  • Are there any known bugs or issues with this dependency?
  • Which architecture patterns are popular for this project case?
  • Which projects apply this architecture style?
  • What are identified risks from this architecture style/project?
  • How are scalability and redundancy achieved in this project?

Basically, all sorts of higher-level information either on a single or multiple projects within ELCA. This makes this use case very unique as it is not a programming copilot, nor is it a generic FAQ or corporate information assistant.

One might argue that some of these things can be answered using tools that already exist (for example dependency scanners can give a lot of information about the dependencies of a project and related information). This is yet another challenge, this Chat Bot needs to either answer questions that could not easily be answered before, or at least answer multiple types of questions that would require using many tools and be cumbersome to use regularly.

Another very important type of question is “why” questions:

  • Why was this framework picked instead of others?
  • Why did the architecture of this project change from A to B?
  • Why was this risk considered negligible in this project?

These questions are valuable because they often are based on the choices architects made in the past, and often have complex answers. The issue is that most documentation does not contain information about the reasons behind design choices. This links to a fundamental limitation of RAG models and AI as a whole: the information needs to exist and be documented somewhere, everything needs to be written down, which is cumbersome, especially when it needs to be kept up to date regularly.

Photo by Ant Rozetsky on Unsplash

Legal questions and concerns

Sadly, the uniqueness of this Chat Bot is the source of many challenges. First, answering unique questions requires using varied data sources and types. In our case, one of them is project code. As mentioned earlier, project code in ELCA belongs to the client, and how that code can be used is defined by a contract. However that contract currently does not mention the use of AI for analysis. On one hand, all the RAG does is analysis and extracting valuable insights about the project that anyone who worked on this project could share. On the other hand, while sharing such insights is valuable, using an LLM to analyze code and storing this code on the cloud for the LLM to retrieve, is definitely not something any client signed for in the first place.

In fact, this question is in such a grey area that it needs a lot of discussion both with the security and privacy team, as well as the legal team, for everything to be clear and for boundaries to be made. While this is a lot to handle as an intern and goes beyond just my internship, I am happy I can start this discussion and allow future similar projects to get clear answers and restrictions from the get-go.

One central point of the discussion is whether the Chat Bot transforms the confidential information from documents into “know-how”, which can be shared freely, or if the answers resemble the source too much for it to be considered not confidential anymore. In essence this Chat Bot is similar as a fellow engineer sharing know-how about a project they worked on recently, the main difference being that the engineer is aware of the contract they are bound to, whereas the Chat Bot is completely unaware and might end sharing information that was confidential.

As a way for my internship to progress while the discussion is still ongoing, I limit the projects I use to internal ones, where the “client” is ELCA, as well as open source projects that have sufficient documentation and similar technologies as the ones used in ELCA.

Limitations and challenges in capabilities

The second challenge is enabling the RAG to answer these unique questions. Because (to the best of my knowledge) no one has written about trying to use a RAG for such a case. And since the information is not directly in the source (well it is, but not at all in a clear way), we cannot simply feed the raw source to the LLM and let it do the information extraction as-is. For example, say we want to see the versions of the libraries a project uses, unless the LLM knows :

  1. What type of project it is (Java, .NET, Python)
  2. Which dependency management the project uses (e.g. Maven vs Gradle for Java)
  3. Where to find dependencies for this dependency management

There is no way it can even retrieve meaningful sources, let alone all the information needed to extract all dependencies correctly, especially because it is working with a limited context size (yes, there are models that have large context windows, no they do not perform great with such huge contexts).

To mitigate this, we extract this information programmatically before indexing the data, and add it to the sources. For example, we have one source per project which contains a list of all dependencies along with their versions used. Instead of adding it to the raw sources, we could instead keep it in a structured database and let the LLM query that database, what matters here is that this information can be retrieved with good recall and precision.

Both the raw code and the structured data that was extracted end up in the same index

A notable constraint with this kind of solution is that capabilities of the solution are linked to what information we extract, so if we want to answer a new question, we may need to extract new information and add it to the data model. One might argue that this defeats the purpose of such a Chat Bot, but all it does is allow for more questions to be answered by providing more information. The raw data can still be explored, it is simply enriched with related and clearer information.

The resulting architecture

After multiple tests and evaluations, the best fit for this project is an architecture that can be called “Modular RAG”, in the sense that we let an LLM decide where it wants to retrieve information from (from a given set of sources), and if it needs to retrieve more information before answering or not. This gives a lot of freedom to how a question is answered, which in turn gives a lot of freedom about what kind of question can be answered. As mentioned before, the only limiting factor is the set of sources and the information they contain (there are quite a few other limiting factors in RAG: context size, the ability of the LLM to fully understand the prompt, the precision and recall of the retrieval system, etc, but these are inherent to RAG and do not come from our architecture).

Simplified flowchart of the architecture, here there are 3 retrieval options but there can be as many as needed

However, with great power comes great responsibilities, this architecture puts all the pressure on the LLM, and so it is important to at least semi-monitor the flow of action to avoid errors and redirect the LLM in the right direction whenever needed. This is mainly done by projecting the LLM’s answers to each question onto a set of known possible answers. For example when we ask “Which source do you need to retrieve information from to answer this question ?”, we match the LLM’s answer with the set of sources, and if there is no match we can guide the LLM back on track by defaulting to a certain source or stopping the retrieval early. Another pretty clear issue is that this flowchart could run forever, so we put a limit on how many times information can be retrieved to avoid that. These solutions are usually called guardrails and are also often used to stop the model from answering any question that should not be answered (such as “How can I build a bomb ?”).

Further improvements

Once this architecture is set up, it can be iteratively improved using both automatic and user feedback. New retrieval sources can be added, the LLM and embedding models can be upgraded or even fine-tuned if clean data is available, access control for certain data can be introduced with a user login. The modularity of this architecture and of RAG as a concept allows for very straight-forward agile development to keep up with the available data and the functionality requirements of the solution.

The takeaway

Such a State-Of-Engineering Chat Bot can provide software architects with readily available, new and insightful information, which is valuable in a company striving to deliver quality solutions. On top of that, this kind of pioneering project opens doors and clears paths for generative AI at ELCA as a whole. It is a new technology, which means the industry has to adapt, and projects like mine push companies to prepare for it. We plant the seeds for generative AI to bloom and thrive.

--

--