Build a RAG System with Rig in Under 100 Lines of Code

A Comprehensive Guide to Building an LLM Application with Rig

8 min readSep 8, 2024

TL;DR: Continuing our journey with Rig, this guide builds on the initial introduction and the 5 compelling reasons to use it for your next LLM project. Here, I’ll walk you through building a Retrieval-Augmented Generation (RAG) system in Rust using the Rig library. In under 100 lines of code, you’ll learn to extract text from PDF documents, generate embeddings with OpenAI’s API, and enable a large language model to answer questions based on the documents’ content.

Introduction

Retrieval-Augmented Generation (RAG) is a powerful technique that enhances Large Language Models (LLMs) by combining them with external knowledge retrieval. When a query is received in a RAG system, relevant information is first retrieved from a knowledge base and provided to the LLM along with the query. This method enables the model to generate responses that are contextually relevant and up-to-date, overcoming limitations such as outdated knowledge or hallucinations.

Learn more about the fundamentals of RAG here.

Rig is an open-source Rust library designed to simplify the development of LLM-powered applications, including RAG systems. We’ll build a functional RAG system using Rig that answers questions based on the content of PDF documents, showcasing how RAG can be applied to real-world data sources. Let’s dive in and start building!

💡 Tip: New to Rust?
This guide assumes some familiarity with Rust and a set-up coding environment. If you’re just starting out or need to set up your environment, check out these quick guides:
Introduction to Rust
Setting up Rust with VS Code
These resources will help you get up to speed quickly!

Setting Up the Project

Before we start coding, let’s set up our Rust project and install the necessary dependencies.

First, create a new Rust project in your coding environment:

Initialize a new Rust project and navigate into the project directory.

Now, let’s add Rig and other required dependencies to our Cargo.toml file:

Add the required dependencies for building the RAG system, including Rig, Tokio, and PDF extraction.

Here’s a brief explanation of each dependency:

rig-core: The main Rig library for building LLM applications
tokio: An asynchronous runtime for Rust
anyhow: For flexible error handling
pdf-extract: To extract text from PDF files

Before we begin coding, make sure you have an OpenAI API key. Set it as an environment variable:

Set the OpenAI API key as an environment variable.

With our project set up, let’s move on to building our RAG system step by step.

Building the RAG System

We’ll break down our RAG system into several key components. Let’s go through each one, explaining its purpose and implementation.

Step 1: Setting up the OpenAI client and PDF extraction

First, we’ll set up the OpenAI client using Rig and create a function to extract text from PDFs:

Set up the OpenAI client and define a function to extract text from a PDF file.

This code sets up our OpenAI client and defines a function to extract text from PDFs. The load_pdf_content function uses pdf_extract to read the content of a PDF file and returns it as a string, with anyhow providing context for any errors that occur.

Step 2: Creating the document store

Next, we’ll create an in-memory vector store to hold our documents:

Initialize an in-memory vector store to hold document embeddings.

The InMemoryVectorStore is a simple vector store provided by Rig that keeps all data in memory. This is suitable for small to medium-sized document collections. For larger collections, you might want to consider a persistent storage solution.

Step 3: Implementing the embedding model

Now comes the crucial part where we set up our embedding model and add documents from PDFs to our store:

Create document embeddings using OpenAI’s text-embedding-ada-002 model and add them to the vector store.

This section is where the magic happens. We’re using OpenAI’s text-embedding-ada-002 model to create embeddings for our PDF documents. These embeddings are vector representations of the text that capture semantic meaning, allowing for efficient similarity searches later.

We load the content of two PDF files, create embeddings for them using the EmbeddingsBuilder, and then add these embeddings to our vector store. This process transforms our raw text data into a format that can be quickly and effectively queried.

Step 4: Building the RAG agent

With our document store populated, we can now create our RAG agent:

Build a RAG agent with a preamble and dynamic context selection using the vector store.

This code creates a RAG agent using OpenAI’s GPT-3.5-turbo model. The preamble sets the behavior of our agent, and dynamic_context(2, ...) tells the agent to use the top 2 most relevant documents from our vector store for each query. The vector_store.index(embedding_model) creates an index of our vector store using our embedding model, which allows for efficient similarity searches.

Step 5: Using Rig’s Built-in CLI Interface

One of the great features of Rig is its built-in utilities that simplify common tasks. Instead of creating our own CLI interface from scratch, we can use Rig’s cli_chatbot function:

Launch a CLI chatbot interface for the RAG agent using Rig’s built-in utilities.

This simple change not only reduces our code but also provides a more robust CLI interface with built-in chat history functionality.

Putting It All Together

Now that we’ve gone through each component, let’s look at the complete code for our RAG system:

Full implementation of the RAG system, from PDF extraction to chatbot interaction.

That’s it! We’ve built a functional RAG system in less than 100 lines of Rust code. It handles PDF extraction, embedding creation, vector storage, and interaction with OpenAI’s GPT model, all while leveraging Rig’s built-in CLI interface for a smooth user experience.

Running and Testing the RAG System

To run the system, ensure you have the PDF files (“Moores_Law_for_Everything.pdf” and “The_Last_Question.pdf”) in a documents folder in your project root. Then use the following command:

Once the system is up and running, you can start interacting with it. Let’s explore some example interactions to see how our RAG system performs with different types of queries.

Example Interactions with the RAG System

To illustrate the practical utility of our RAG system, let’s explore several interactions. These examples demonstrate how the system processes queries and generates knowledgeable responses based on embedded document content.

Let’s break down what we’re looking at:

Greeting and Open-ended Interaction: The agent can engage in general conversation, as shown by its response to “hi”.
Specific Document Summary: The agent provides a concise summary of Sam Altman’s ideas from “Moore’s Law for Everything” when asked.
Theme Analysis: When asked about Asimov’s “The Last Question”, the agent demonstrates its ability to analyze and explain the central themes of a literary work.
Cross-Document Analysis: The final question showcases the agent’s ability to draw connections between two different texts, comparing and contrasting their themes and ideas.

This is RAG in action, and it’s made possible by the combination of:

The vector store, which allows for efficient retrieval of relevant information
The embedding model, which captures the semantic meaning of the text
The large language model (GPT-3.5-turbo in this case), which generates coherent and contextually relevant responses
Rig’s built-in CLI interface, which provides a smooth user experience with chat history functionality

By leveraging these components through Rig’s intuitive API, we’ve created a system that can understand and respond to complex, multi-faceted questions across multiple documents, all while providing a user-friendly interface.

Potential Applications

Seeing these examples, you might start to imagine the potential applications of such a system:

Research Assistant: Help researchers quickly find and synthesize information across multiple papers or documents.
Educational Tool: Assist students in understanding complex topics by providing explanations and drawing connections between different concepts.
Content Analysis: Aid writers or journalists in analyzing themes and ideas across multiple texts.
Customer Support: Provide detailed, context-aware responses to customer inquiries based on product documentation.

The possibilities are vast, and with Rig, implementing these applications becomes much more accessible.

Deploying to Production

While our current implementation is suitable for local testing and small-scale use, there are several considerations to keep in mind when deploying such a system to production:

Scalability: For larger document collections, consider using a dedicated vector store instead of the in-memory store. Rig currently supports MongoDB with other Vector stores in the roadmap. If you want to implement a specific vector store or have a request for one, please visit the Rig Repo and submit an issue.
Document Processing: Implement more robust PDF parsing, including error handling for corrupted files and support for other document formats. You might want to use a dedicated document processing pipeline.
Caching: Implement response caching to reduce API calls and improve response times for repeated queries. This can significantly reduce costs and latency.
Security: Ensure proper handling of API keys and sensitive information, especially if processing confidential documents. Use environment variables and secure vaults for storing secrets.
Monitoring and Logging: Implement comprehensive logging and monitoring to track system performance and user interactions. This will help in debugging and improving the system over time.
Rate Limiting: Implement rate limiting to manage API usage and prevent abuse. This is crucial both for cost management and to comply with API provider terms of service.
Asynchronous Processing: For handling large documents or many concurrent users, consider implementing asynchronous processing with a job queue.

By addressing these considerations, you can turn this proof-of-concept into a robust, production-ready system. In a future guide, i’ll walk you through deploying a RAG system built with Rig to production, subscribe to my blog or join our community to be notified.

Conclusion

In this tutorial, we’ve built a functional RAG system using Rig in under 100 lines of Rust code. This system can process PDF documents, create embeddings, and answer questions based on the content by leveraging a large language model.

The benefits of using Rig for RAG systems include:

Simplified API for working with LLM providers
Easy integration of vector stores and embedding models
High-level abstractions that reduce complexity
Flexibility to work with various data sources, including PDFs
Type-safe Rust code that ensures robustness and performance

We invite you to dive into building with Rig and explore its capabilities firsthand. Your contributions and feedback are invaluable to us as we continue to enhance Rig’s functionality and user experience. Join us in shaping the future of LLM-powered applications!

Further Resources

To deepen your understanding and continue building with Rig, check out these resources:

Your Feedback Matters! We’re offering a unique opportunity to shape the future of Rig:

Build an AI-powered application using Rig.
Share your experience and insights via this feedback form.
Get a chance to win $100 and have your project featured in our showcase!

Your insights will directly influence Rig’s growth. 🦀✨

Ad Astra,
Tachi
Co-Founder @ Playgrounds Analytics