Simple RAG implementation using Open Source Tools

Pranav Jadhav
5 min readAug 8, 2024

--

What is RAG?

Its an architecture of a RAG implementation
RAG Architecture (source: https://www.e2enetworks.com)

RAG (Retrieval Augmented Generation) is a technique that retrieves the augmented generated output. Here, generation means the output generated by the model, and the retrieval fetches the output from the corpus generated by the model. It’s an architecture that builds the entire layout of the Generative AI model. It includes data loading, pre-processing, data storing(database), and output generation. It helps manage and improve the accuracy of the large language models in tasks such as question answering, summarization, and dialogue systems. Frameworks like LangChain can implement RAG.

What are Open Source Tools?

These are the tools whose source code is made available to the public. They can read, request changes, and use the tools for free. These models are managed by the organization and anyone can contribute in its development. These tools are generally free to use and can be implemented in personal projects from development to production level by individuals or by organizations.

There are various ways to implement the RAG system. We can use paid or open-source tools depends on the choice of user. Here I will tell you about how you can implement it using open source tools which will be completely free of cost. Here we are going to use the following open-source tools:

  • Ollama Large Language Model
  • Faiss Vector Database
  • LangChain Framework

Implementation

We are going to use the LangChain framework. It includes everything from loading and pre-processing tools to model implementation.

Step 1: Generating LangChain API key

First, you need to create an account on langchain and get the API key. You can sign up from here https://smith.langchain.com/. This is the platform of LangSmith where you can track all the activity which includes requests made through the LangChain utilities. Copy the generated API key as it is only visible at the time of creation.

LangChain API creation portal
LangChain API Portal

Step 2: Setup the Virtual Environment on the Local Machine (VS Code)

A Virtual environment allows you to install the desired versions for the specific project irrespective of the outside environment. It prevents conflict between the versions of software installed on your PC and project requirements. Write the below commands in the terminal to create and activate the virtual environment.

python -m venv env
.\env\Scripts\activate

Step 3: Install the Dependencies

Create a requirement.txt file and write the names of the below dependencies.

python-dotenv
langchain
langchain_core
langchain_community
langchain-ollama
langchain_text_splitters
faiss-cpu
langchain-ollama

After creating the txt file run the below command to install the dependencies.

pip install -r requirements.txt

Step 4: Configure the generated API key

Create a .env file and paste the API key with “LANGCHAIN_API_KEY” as a variable name.

LANGCHAIN_API_KEY=lsv2_pt_806efdca73f3480aabd

Then create an app.py file where we will write the project code. Configure the API key in the local machine by writing the below code in the app.py file.

import os
from dotenv import load_dotenv
load_dotenv()
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")

Step 5: Load the Data

You can import data from various sources by using document loaders. It allows to load data from web, pdf files, text files, json files, etc. Here, we are loading data from a local machine from a txt file so am using TextLoader. You can choose the data loader for your data source from this URL Conceptual guide | 🦜️🔗 LangChain. Remember to pass the file location of your document.

from langchain_community.document_loaders import TextLoader

loader = TextLoader("C:/Users/prana/Desktop/sample.txt")
docs = loader.load()

Step 6: Split the data

After loading data, we are going to split it into chunks to process it effectively. It helps in memory management and streaming data. It helps to establish parallel processing, batch processing, and error handling. It increases the overall performance of the data handling.

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=100
)
texts = text_splitter.create_documents(docs)

Here chunk_size is the max size of the generated chunk or document. There can be certain sentences that are split into different documents. So to prevent it chunk_overlap helps to append the last n entries to the current document which ensures continuity and context, especially in tasks that involve sequences, such as text processing, time series analysis, and some machine learning applications.

Step 7: Create Embeddings

Traditional machine learning model uses Tf-Idf, Bag of words, CountVectorizer, etc to convert text into vectors. However, it lacks vocabulary when dealing with language models. So langchain provides various embedding techniques from LLM models from which we are using Ollama Embeddings.

from langchain_ollama import OllamaEmbeddings

embed=OllamaEmbeddings(model="llama3.1")

Step 8: Store and retrieve Embeddings in the Vector Database

Vector databases are specialized databases designed to store, index, and query high-dimensional vectors efficiently. These vectors represent data points in a continuous vector space, commonly used in machine learning, natural language processing, image processing, and recommendation systems.

After loading data into a vector database, it can be converted into a retrieval tool. This enables efficient and effective data retrieval, particularly for similarity searches or nearest neighbor searches. It acts as a retrieval in RAG.

from langchain_community.vectorstores import FAISS

db = FAISS.from_documents(texts, embed)
retriever = db.as_retriever()

Step 9: Make Prompt and Define Model

Create a prompt as per your need. It acts as a commander for the LLM model and directs its response based on the defined conditions and constraints. You can find various prompt templates here LangSmith (langchain.com). Below is an example of a prompt template for a QA bot.

from langchain_core.prompts import ChatPromptTemplate

prompt=ChatPromptTemplate.from_template(
"""
Based on the {context} provided answer the query asked by the user in a best possible way.
Example1- Question:"What skill is necessary to become Data Scientist?"
Answer:"SQL, Python, Machine Learning and concepts which help in future values predictions."
Question:{input}
Answer:
"""
)

After making the prompt, define an LLM model. We are using Ollama LLM cause It’s open source and provides functionality to import various large language models.

from langchain_ollama import OllamaLLM

model=OllamaLLM(model='llama3.1')

Step 10: Create a chain

Creating a chain involves linking a series of processing steps together to transform an input into a desired output. It means setting up a sequence of operations that take a user query, retrieve relevant information, process it, and generate a coherent response. This approach helps build efficient, modular, and scalable systems. Here we are going to chain the model and prompt and then this generated document chain with retrieval. So the retrieval which has embeddings will access vectors based on the generated output of the Ollama model.

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

combine_docs_chain = create_stuff_documents_chain(model, prompt)
retrieval_chain = create_retrieval_chain(retriever, combine_docs_chain)

Step 11: Test and Results

The RAG system is implemented and now we are ready to use the system by passing a query to the model.

result=retrieval_chain.invoke({'input':"In which industry do i work?"})
print(result["answer"])

Conclusion

RAG (Retrieval Augmented Generation) is a powerful technique used to enhance the performance of Generative AI models by retrieving relevant information from a corpus to improve output accuracy. Implementing RAG using open-source tools can be cost-effective and efficient. This guide demonstrates how to build a RAG system using open-source tools. This modular approach ensures efficient, scalable, and accurate responses for various Generative AI tasks.

--

--

Pranav Jadhav

Passionate Data Scientist. Interested in Machine Learning, Deep Learning, and Generative AI-related concepts.