Building a basic RAG (Retrieval Augmented Generation) system in a Rails app

Andrei Bondarev
3 min readNov 4, 2023

--

What is RAG?

RAG (Retrieval Augmented Generation) is a methodology that assists Large Language Models (LLMs) generate accurate and up-to-date information. A typical RAG workflow follows the 3 steps below:

  1. Relevant knowledge (or data) is retrieved from the knowledge base.
  2. A prompt, containing retrieved knowledge above, is constructed.
  3. LLM receives the prompt above to generate a text completion.

Most common use-case for a RAG system is powering Q&A systems where users via the UI pose natural language questions and receive answers in natural language. The following diagram illustrates a RAG-powered Q&A system:

[1]

A standard prompt that’s used in these systems looks like the following:

Context: {context}
---
Question: {question}
---
Answer:

The context that the LLM will be using is combined with the (user’s) question in a single text prompt. “Answer:” signals to the LLM that it, in turn, should complete the rest. This is the exact prompt that we use in our Langchain.rb library. As the Q&A system is optimized slight modifications can be made to the original prompt to set constraints and provide additional instructions to the LLM.

Adding RAG to the Rails app

We built the Rails’ langchainrb_rails library to empower developers to easily build (or sprinkle ✨) AI functionality into their existing Rails apps.

Let’s get started… ! 🚀

Installation

(You can also follow the installation instructions in the gem’s README.)

bundle add langchainrb_rails

The library supports numerous LLM providers (e.g. Google Palm, Anthropic, Cohere, etc.) and vector search databases (e.g. Weaviate, Qdrant, Pinecone, etc.) but we recommend using OpenAI and Pgvector if you’re just getting started.

Let’s run a rake task to help you configure the library for an ActiveRecord model of choice. You should hook it up to your ActiveRecord model that stores the data that will be used in the Retrieval step of RAG.

rails generate langchainrb_rails:pgvector --model=Document --llm=openai

After running the above a helpful message appears that walks you through the next steps required:

$ rails generate langchainrb_rails:pgvector --model=Document --llm=openai
create db/migrate/20231101013436_enable_vector_extension.rb
create db/migrate/20231101013437_add_vector_column_to_documents.rb
create config/initializers/langchainrb_rails.rb
insert app/models/document.rb
gemfile neighbor
gemfile ruby-openai
Please do the following to start Q&A with your Document records:
1. Run `bundle install` to install the new gems.
2. Set `OPENAI_API_KEY` environment variable to your OpenAI API key.
3. Run `rails db:migrate` to apply the database migrations to enable pgvector and add the embedding column.
4. In Rails console, run `Document.embed!` to set the embeddings for all records.
5. Ask a question in the Rails console, ie: `Document.ask("")`

New gems were added to your app’s Gemfile so let’s bundle again:

bundle install

Next — set your ENV["OPENAI_API_KEY"] and run the migrations:

rails db:migrate

You should see the following output that runs a 1) migration to enable the Pgvector PostgreSQL extension and a 2) migration that adds the embedding column to your ActiveRecord model’s table.

== 20231101013436 EnableVectorExtension: migrating ============================
-- enable_extension("vector")
-> 0.0649s
== 20231101013436 EnableVectorExtension: migrated (0.0650s) ===================

== 20231101013437 AddVectorColumnToDocuments: migrating =========================
-- add_column(:documents, :embedding, :vector, {:limit=>1536})
-> 0.0020s
== 20231101013437 AddVectorColumnToDocuments: migrated (0.0021s) ================

We’re almost done! We should generate vector embeddings for any records that existed before we added the library. Let’s do it by running the following command in the rails console:

Document.embed! # Finds every record, generates and saves an embedding to the new column

We’re done!

We can now execute .ask() method on our model that will execute a full RAG workflow.

Document.ask("question")

Behind the scenes:

  1. Our question will be converted to a vector embedding.
  2. Using Pgvector the system will retrieve closest documents to the passed in embedding.
  3. These documents as a context, along with the question, will be put in a prompt and sent the LLM.

What’s next?!

The next step is improving the performance of your RAG system. You can’t improve what you don’t measure. We’ve implemented the RAGAS evaluation metrics specifically for this reason. (Blog post coming soon).

--

--