Build a Multi-Query RAG pipeline in Langflow š
Retrieval-Augmented Generation (RAG) is an AI app development technique to use external content with large language models (LLMs) in order to generate relevant responses about data that is not part of itās training.
For example, you can use RAG to build a chatbot that can answer questions from real-time customer or transaction data, or from internal documents or data sources.
But basic RAG applications can āhallucinateā, where the the LLM generates convincing, but incorrect or fabricated information. RAG apps with high hallucination rates are risky to put in front of real users!
To address this, advanced RAG techniques like FLARE, ReAct, and others are evolving to reduce hallucinations. These methods typically use the LLM to generate multiple variations of the user question or multiple LLM responses in order to create more relevant answers.
In this example, weāll walk through a Multi-Query RAG pipeline in Langflow. This fairly straightforward advanced RAG technique improves relevance by generating multiple query variations in order to create a richer context in the LLM prompt.
You can get a fast overview of this Multi-Query RAG example in the video below, and then letās build it in about 20 minutes in Langflow!
Ready? Letās do it!
Simple RAG
In a simple RAG pipeline, text from external documents, transactions or other sources are extracted and split into āchunksā or text fragments of a predetermined length. These chunks are fed to an LLM, which āembedsā, or generates vectors, for each chunk. These vectors (called āembeddingsā) are stored in a vector database along with the original text data.
When the user asks a question, their query is also embedded into a vector which is used for a similarity search to find ānearest neighborā vectors in a database, representing text that is semantically similar to the users question.
These query results are converted back from vectors into text and then sent to an LLM as context in the prompt to generate the response.
Multi-Query RAG
In Multi-Query RAG, we take the process a step further. The LLM generates multiple queries or multiple variations of the userās question.
These queries are then embedded and used to search the vector database. The multiple, different responses are combined to create more context in LLM prompt, likely giving a better response.
Step 1: Install the Template
First, make sure you have Langflow installed and running. Once Langflow is running on your browser, go to the Langflow Store and download the RAG Multi-Query template.
š” Youāll need an API key from the Langflow store to look up and install the template on your local Langflow.
Of course, you can also build the flow from an empty canvas by creating a new project and adding and connecting all the components, but starting with a template project will save you some time š
Step 2: Data Ingestion
First, add this PDF of College Survival Tips into the File Loader component as an external data source to vectorize.
Adjust the Split Text Component parameters to a chunk size of 50 and overlap of 20 characters. Important: add a single space character in the Separator field - the component will add it between chunks.
This flow uses the Cohere LLM to create the embeddings, so create a free account, add your API key.
Run the injestion pipeline by hitting Play on the Chroma DB Component. This builds and runs all the components to the right, creating the collection of vector embeddings in Chroma DB. Then click the Vector Store icon to see your vector embeddings (see below).
Step 3: Multi-Query Generation
Next, letās head up to the query flow in Langflow.
Modify the user question in the Chat Input component to something that is related to our external PDF we ingested above.
Click the Template field of the Prompt Component and see the prompt asks the LLM to create three different versions of this question.
Step 4: Run Each Query and Combine Responses into Context
Add your Cohere API key to each of the three Group Components, which were created by grouping the vector search flow above (see GIF below). Each Group runs a vector search for each of the questions.
Select the Template field in the Prompt Component to see the assembled context that will be used to generate the final response.
Step 5: Submit the combined context for the final LLM response
Run the complete pipeline from the OpenAI component at the end of the flow (this will run every component to the left) and select the Text field to see the LLM response. You can also run the Playground to try your workflow in the Chat Output interface as below.
Congrats!
You just built a Multi-Query RAG pipeline ā an advanced RAG technique that uses the LLM to create multiple varieties of the original question to create a more accurate LLM response.
We hope this inspires you to keep learning about more advanced RAG techniques, thereās a lot more we will cover soon! š
Thanks for supporting Langflow! Join the community on Discord and GitHub!