Advanced Retrieval Augmented Generation (RAG) Apps with LlamaIndex

7 min readJun 4, 2024

## Introduction

In a recent Unstructured Data Meetup hosted by Zilliz (San Francisco), Laurie Voss, VP of Developer Relations at LlamaIndex, delivered a talk on “Building Advanced RAG Apps with LlamaIndex. He shared his knowledge on how to make Retrieval Augmented Generation (RAG) Frameworks simpler and production-ready with LlamaIndex.

>> Watch Laurie’s Talk >>

Let’s dig into the concepts of RAG and see how LlamaIndex 🦙 assists with developing RAG apps. To give a quick background on LlamaIndex (formerly known as GPTIndex), much of their earlier works are of working with robust closed-source technologies like OpenAI. However, as open-source models started to catch up, they started integrating with open-source alternatives. Whether it’s LLMs, embedding models, or re-ranking technologies, they now cover a number of options for a user to use their own data to build a RAG Application.

What is Retrieval Augmented Generation (RAG)?

Systematic RAG Workflow

RAG is a framework designed to overcome the limitations of LLMs by assisting them with retrieval capabilities. The main drawback of LLMs is that they have limited context windows and can only handle part of an organization’s data simultaneously (Not more than a million tokens at a time). RAG addresses this problem by selectively retrieving only the most relevant text/data to provide the most accurate responses.

Key Benefits of RAG:

Accuracy: When relevant and concise data is retrieved by implemented retrieval methods and sent to an LLM, it ensures accurate responses.
Recency: Some may question why companies don’t train LLMs using their data. The challenge with fine-tuning is that loading an LLM into any environment, especially high-quality ones, is already costly. On the other hand, retrieval-augmented Generation (RAG) allows for easy, real-time data updates. This makes RAG a practical solution for dynamic data environments, such as those found in large private law firms or industrial companies.
Provenance: RAG can trace the niche sources of information, which is crucial for applications requiring verifiable data.

To simplify, remember the Comprehension Tests you gave as a kid? RAG is highly similar to it. The retrieved chunks of text act as a comprehension, from which the Kid (LLM) has to answer queries. It’s that simple ;-)

How RAG Works

RAG is a solution to LLMs’ limited context windows. It retrieves the most relevant data, augments queries with context, and generates responses.

Retrieval Techniques

Two techniques are widely used for information retrieval.

Keyword Search: Traditional keyword search methods are still effective for retrieving data based on specific terms.
Vector Search (Most Powerful): Think of a vector search as something that turns words into numerical lists called vectors. These vectors capture the essence of each word’s meaning. We can find similar words by comparing these vectors based on their numerical closeness (measured mainly by cosine similarity or dot product). Vector search helps us retrieve data much more effectively and reliably. Once we perform the vector search, the LLM can access the most relevant text to answer the query.

When you think of a vector search, think of your Oxford Dictionary. However, group the words by their letter sequences instead of grouping them by their meaning. The vector embedding models do the same thing dimensionally.

Dimensional and Semantic Grouping of Terms using Vector Search

Also, open-source Vector Databases like Milvus can be used to set up Vector Stores for free! Check it out here.

LlamaIndex: Simplifying RAG Implementation

LlamaIndex is an open-source framework that connects your data to LLMs, available in Python and TypeScript. It simplifies the creation of RAG applications, allowing developers to build functional RAG systems with minimal code. To implement a basic RAG functionality, you only need five lines of Python Code, thanks to LlamaIndex.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader


documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

Advanced RAG Features for Data Ingestion and Querying by LlamaIndex

LlamaIndex also provides a wide range of advanced data ingestion and querying features for your RAG applications.

Ingestion

Data Connectors: LlamaHub by LlamaIndex provides connectors for various data sources, including Google Drive, Notion, Slack, and databases like Postgres and MongoDB.
PDF Parsing: LlamaIndex offers advanced PDF parsing capabilities, converting complex PDFs into Markdown for better LLM understanding since LLMs are trained on a large chunk of Markdown Data.
Embedding Models: LlamaIndex supports various open-source and closed-source embedding models, allowing customization based on the domain.
Vector Stores: LlamaIndex integrates with multiple vector databases, again, both open and closed source, including Milvus and Zilliz Cloud, for efficient storage and retrieval.

Querying

Sub-Question Query Engine

For complex questions that require multiple simple queries, the sub-question query engine breaks down the main query into smaller parts, retrieves answers from different sources, and combines them into a single response.

See the Code here.

Small to Big Retrieval

This methodology involves embedding extremely small pieces of text (sentences that are highly and likely relevant to the user query) into vectors and retrieving the window of vectors around that sentence both up and Down.

Small to Big Retrieval methodology visualization

See the Code here.

Metadata Filtering

Pre-tagging documents with metadata (e.g., year, user, company, keywords), like the example below, allows for more precise filtering and retrieval, enhancing the accuracy of the LLM’s responses. This feature can help with Keyword Search. Think of metadata like custom properties for text.

Hybrid Search

By combining keyword and vector search, the hybrid search runs queries through the specified methods. It merges the results based on confidence scores, ensuring the most relevant data is retrieved — a detailed example with Code here.

Agents

LlamaIndex Agents are semi-autonomous software pieces that use various tools to achieve a goal. They can combine multiple retrieval strategies and tools to answer complex queries effectively.

Think of it this way: a tool is a predefined user function on which a user can set a description of what it does, and the Agent is an Engineer who works with these tools. If the user prompts an LLM directly on what the multiplication of two huge numbers will result in, it will most likely give an answer nearing the actual ground truth answer but not the same. But if we give it a multiplication tool that it can call based on a set description, it will consistently achieve the correct answer no matter the size of the numbers.

Similarly, a user can build an Agentic RAG Workflow by giving LLM the tools and their descriptions and letting it decide which to use for a specific RAG query.

Summary

Laurie’s presentation showcases basic and advanced application frameworks for RAG, which we can build with minimal lines of code using LlamaIndex. LlamaIndex also provides us with LlamaParse, which can help us index our data into our favorite vector databases like Milvus locally or on the cloud. It’s ultimately up to the user to choose what best fits their use cases. This presentation also showcased various RAG strategies that one can opt for to optimize Retrieval and Generation.

Citations

Liu, Jerry. “Llamahub.” Llamahub, 2023, https://cloud.llamaindex.ai/parse.
Liu, Jerry. “Starter Tutorial (OpenAI).” LlamaIndex, 2023, https://docs.llamaindex.ai/en/stable/getting_started/starter_example/.
“Llama Hub.” Llama Hub, 2023, https://llamahub.ai/.
LlamaIndex. “Hybrid Search.” LlamaIndex, 2023, https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusHybridIndexDemo/.
LlamaIndex. “LLM based Agents.” LlamaIndex, 2023, https://docs.llamaindex.ai/en/stable/use_cases/agents/.
LlamaIndex. “Metadata Replacement + Node Sentence Window.” LlamaIndex, 2023, https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/MetadataReplacementDemo/.
LlamaIndex. “Sub Question Query Engine.” LlamaIndex, 2023, https://docs.llamaindex.ai/en/stable/examples/query_engine/sub_question_query_engine/.
Milvus. “Milvus.” Milvus: Vector database, 2019, https://milvus.io/.
Milvus. “Milvus Vector Datbases.” Milvus: Vector database, 2023, https://milvus.io/.
Milvus. “Quickstart Milvus documentation.” Milvus, 5 May 2024, https://milvus.io/docs/quickstart.md.