Build a Multi-Query RAG pipeline in Langflow 🚀

Published in

Langflow

5 min readSep 2, 2024

Retrieval-Augmented Generation (RAG) is an AI app development technique to use external content with large language models (LLMs) in order to generate relevant responses about data that is not part of it’s training.

For example, you can use RAG to build a chatbot that can answer questions from real-time customer or transaction data, or from internal documents or data sources.

But basic RAG applications can “hallucinate”, where the the LLM generates convincing, but incorrect or fabricated information. RAG apps with high hallucination rates are risky to put in front of real users!

To address this, advanced RAG techniques like FLARE, ReAct, and others are evolving to reduce hallucinations. These methods typically use the LLM to generate multiple variations of the user question or multiple LLM responses in order to create more relevant answers.

In this example, we’ll walk through a Multi-Query RAG pipeline in Langflow. This fairly straightforward advanced RAG technique improves relevance by generating multiple query variations in order to create a richer context in the LLM prompt.

You can get a fast overview of this Multi-Query RAG example in the video below, and then let’s build it in about 20 minutes in Langflow!

Ready? Let’s do it!

Simple RAG

In a simple RAG pipeline, text from external documents, transactions or other sources are extracted and split into ‘chunks’ or text fragments of a predetermined length. These chunks are fed to an LLM, which ‘embeds’, or generates vectors, for each chunk. These vectors (called ‘embeddings’) are stored in a vector database along with the original text data.

When the user asks a question, their query is also embedded into a vector which is used for a similarity search to find ‘nearest neighbor’ vectors in a database, representing text that is semantically similar to the users question.

These query results are converted back from vectors into text and then sent to an LLM as context in the prompt to generate the response.

Multi-Query RAG

In Multi-Query RAG, we take the process a step further. The LLM generates multiple queries or multiple variations of the user’s question.

These queries are then embedded and used to search the vector database. The multiple, different responses are combined to create more context in LLM prompt, likely giving a better response.

Step 1: Install the Template

First, make sure you have Langflow installed and running. Once Langflow is running on your browser, go to the Langflow Store and download the RAG Multi-Query template.

💡 You’ll need an API key from the Langflow store to look up and install the template on your local Langflow.

Install the Mulit-Query RAG template from the Langflow Store

Of course, you can also build the flow from an empty canvas by creating a new project and adding and connecting all the components, but starting with a template project will save you some time 🙂

Step 2: Data Ingestion

First, add this PDF of College Survival Tips into the File Loader component as an external data source to vectorize.

Adjust the Split Text Component parameters to a chunk size of 50 and overlap of 20 characters. Important: add a single space character in the Separator field - the component will add it between chunks.

This flow uses the Cohere LLM to create the embeddings, so create a free account, add your API key.

Add the PDF to the File Component, Set the Chunking parameters and add your Cohere API key

Run the injestion pipeline by hitting Play on the Chroma DB Component. This builds and runs all the components to the right, creating the collection of vector embeddings in Chroma DB. Then click the Vector Store icon to see your vector embeddings (see below).

Step 3: Multi-Query Generation

Next, let’s head up to the query flow in Langflow.

Modify the user question in the Chat Input component to something that is related to our external PDF we ingested above.

Modify the user question to be something related to the injested external document

Click the Template field of the Prompt Component and see the prompt asks the LLM to create three different versions of this question.

The first prompt asks the LLM to create multiple questions based on the input question field

Step 4: Run Each Query and Combine Responses into Context

Add your Cohere API key to each of the three Group Components, which were created by grouping the vector search flow above (see GIF below). Each Group runs a vector search for each of the questions.

Each question is embedded and submitted as a vector search (see the grouped flow above the main flow)

Run the full flow and examine each of the different responses resulting from the three vector search queries

Select the Template field in the Prompt Component to see the assembled context that will be used to generate the final response.

Step 5: Submit the combined context for the final LLM response

Run the complete pipeline from the OpenAI component at the end of the flow (this will run every component to the left) and select the Text field to see the LLM response. You can also run the Playground to try your workflow in the Chat Output interface as below.

Run the final flow in the Playground for the LLM response to the original user question.

Congrats!

You just built a Multi-Query RAG pipeline — an advanced RAG technique that uses the LLM to create multiple varieties of the original question to create a more accurate LLM response.

We hope this inspires you to keep learning about more advanced RAG techniques, there’s a lot more we will cover soon! 🚀

Thanks for supporting Langflow! Join the community on Discord and GitHub!