Customizing Generative AI Solutions with Retrieval Augmented Generation

Published in

cwan-engineering

6 min readMay 24, 2024

Overview

As ChatGPT burst onto the AI scene with Large Language Models (LLMs), companies were scrambling and looking for ways to take advantage of the capabilities. When enterprise-level LLM solutions were available, companies could use the technology, but needed ways to enhance the models with their own proprietary knowledge bases and domain expertise. While the LLMs out-of-the-box had good general knowledge, data that could really move the needle for companies were not understood by the model. There are two main options for adding this data into the LLM — Retrieval-Augmented Generation (RAG) and fine-tuning the model.

At Clearwater, we dove into the forefront of generative AI solution development, leveraging the power RAG to provide customized solutions based on LLMs. By integrating our domain-specific data into LLMs, we have been able to answer Clearwater-specific queries with a high degree of accuracy and relevance. We have seen tremendous improvements in efficiency for our operations teams and customers as they utilize our LLM platform to ask questions and get answers quickly. The ways in which we have been able to use RAG and prompt engineering provided significant breadth and depth for using LLMs across our various solutions.

Bringing in New Data

Understanding RAG begins with the concept of semantic searching using vectors. In essence, words are transformed into multi-dimensional mathematical vectors, stored in a vector database. Upon querying, the database identifies words with similar vectors, utilizing algorithms like nearest neighbor search. This integration with LLMs results in an enriched prompt consisting of the original query and the data retrieved from the vector database. Consequently, the LLM gains a context of data it didn’t previously possess, enabling it to respond with updated or domain-specific information. The graph below shows an example of how words can be represented as vectors and visualized in two dimensions to show the relationship.

Our Approach

Clearwater created an overall generative AI architecture that uses private LLMs which are only available to us, and we use RAG databases that are also private and protected inside our cloud VPCs. One example of what Clearwater has done is using RAG for an internal knowledge base that is used by our operations teams. As shown in the diagram below, the prompts are wrapped with a system prompt that provides context and guidance to the LLM, and the previous conversation on that session is included. Those go to our LLM Service that uses the prompt and queries the vector database. The vector database returns related content from the prompt, and all of it is sent as a prompt to the LLM. The LLM uses that data with the user prompt to understand what is being requested and what data should be generated as an answer. The response is highly relevant and very repeatable with the RAG approach.

In the following example below, we have internal applications and services named after Greek mythology, and one of those applications is called Helios. If you ask our LLM service “What is Helios,” without a data source enabled, you will get an answer that is commonly known about who Helios was in Greek mythology. However, if you enable our internal knowledge base data source, you will get a much different answer, as seen below in the comparisons. The data from our internal knowledge base vector DB, passed in with the prompt, is prioritized over the knowledge of Helios which the model was trained on.

The responses generated using RAG also include citations indicating the source of the retrieved data. Clicking on these links will open a browser window, directing the user to the original knowledge source document from which the data was extracted. This is one of the advanced features of using RAG data. The source information is stored as metadata within the RAG database. When a query is returned, the associated metadata contains the origin of the response. This significantly enhances the accuracy of the responses and bolsters user confidence in the authenticity of the information, eliminating LLM hallucination concerns.

This model can be extended to integrate multiple data sources, combining responses from the vector database into a comprehensive prompt for the LLM to analyze. For enhanced customization, specific data sources can be allocated to certain user groups, controlling their access and usage. This allows for the provision of access to certain data sources to specific users or groups, while restricting others. However, this level of capability requires integrating authentication and authorization into your architecture, which Clearwater has done. This ensures that the right users have access to the right data, enhancing both the security and the personalization of the information provided.

Taking RAG to the Next Level

Going further with RAG, you can incorporate entire documents into your prompts. This process involves on-the-fly vectorization of the document, which is then used alongside the prompt query to ask specific questions about the document. This approach includes both the prompt and the vectorized document data.

To simplify the conversion of the document to text, you can integrate tools such as Azure’s Document Intelligence or AWS’s Textract. Using these tools also provides a simpler way to break up large documents that go beyond the context window size and use them in your solution. When coupled with well-engineered prompts, this technique opens the door to data extraction and summarization. You can extract specific data from documents, enabling the LLM to return a structured table with the relevant data. This can be further leveraged to create step-by-step plans or to generate lists of action items. This advanced use of RAG not only enhances the depth of your queries, but also adds a new dimension to the versatility and applicability of your LLM.

The diagram below shows a variation of the LLM architecture for this advanced level of RAG usage:

Wrapping It Up

RAG stands out as a potent tool for adapting LLMs to operate on unfamiliar, custom data. Its implementation is not just powerful, but also relatively straightforward; our teams have been able to add new RAG sources in less than an hour. With well-crafted prompts, the full potential of RAG can be harnessed with minimal investment.

Furthermore, the cost-effectiveness of RAG cannot be overstated. By using RAG, you save on the time and resources that would otherwise be spent on training or fine-tuning a model, while still achieving comparable results, depending on your specific use case.

In essence, when aiming to personalize and customize LLMs, RAG should be your first stop. Its architecture offers the best time-to-value, making it an optimal choice for those seeking to maximize efficiency and effectiveness in their AI implementations.

About the Author

Darrel Cherry is a Distinguished Engineer with over 27 years of experience leading organizations to create solutions for complex business problems. With a passion for emerging technologies, he has architected large cloud and data processing solutions, including machine learning and deep learning AI applications. Darrel holds 19 U.S. patents and has contributed to various industry publications. Outside the professional sphere, he enjoys traveling, auto racing, and motorcycling, while also spending quality time with his family.