A Quick Experiment on Building Your Own GEN AI Application Utilising RAG and Google’s Gemma

5 min readMar 2, 2024

Introduction

Retrieval-Augmented Generation (RAG) is a cutting-edge technique that boosts the accuracy and reliability of generative AI (GEN AI) models. It does so by integrating information from an external knowledge base into the model’s prompt engineering process. This innovation allows GEN AI models to be applied directly to proprietary data without the need for complex and expensive retraining.

The advent of open-source GEN AI models has democratised the development of RAG systems, granting developers complete autonomy over their data. This is particularly advantageous in situations where regulatory compliance challenges the use of cloud-based solutions hosted beyond national borders.

With Google’s recent decision to open-source its Gemini Model, I embarked on an experiment to construct a straightforward RAG system, showcasing its potential.

(image source: Nvidia)

Drawing inspiration from Google’s sample code, I sought to enhance the demonstration by integrating a process for transforming PDF documents and creating embeddings. This process employs the ‘all-miniLM-L6-v6’ model for swift sentence transformations, the PyPDF library for document segmentation, and Meta’s FAISS for embedding retrieval. In a production setting, one would implement a vector database and create a vector search index for efficient data handling.

Experimentation

The experiment is structured around three code blocks:

Import Libraries: this code block imports all the libraries used by the process.

from transformers import AutoTokenizer, AutoModelForCausalLM

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

Chatbot Class: This segment, derived directly from the sample code, is designed for prompt engineering and interaction with the Gemma model via Hugging Face.

class Assistant:
   """Gemma 2b based assistant that replies given the retrieved documents"""
   def __init__(self):
       self.tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b", token=access_token)
       self.Gemma = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", token=access_token)


   def create_prompt(self, query, retrieved_info):
       # instruction to areply to query given the retrived information
       prompt = f"""You need either to explain the concept or answer the question about Power Systems.
       Be detailed, use simple words and examples in your explanations. If required, utilize the relevant information.
       Instruction: {query}
       Relevant information: {retrieved_info}
       Output:
       """
       return prompt
  
   def reply(self, query, retrieved_info):
       prompt = self.create_prompt(query, retrieved_info)
       input_ids = self.tokenizer(query, return_tensors="pt").to("cuda")
       # Generate text with a focus on factual responses
       generated_text = self.Gemma.generate(
           **input_ids,
           max_length=500, # let answers be not that long
           temperature=0.7, # Adjust temperature according to the task, for code generation it can be 0.9
       )
       # Decode and return the answer
       answer = self.tokenizer.decode(generated_text[0], skip_special_tokens=True)
       return answer

Retriever Class: This component establishes a vector store for PDF files and defines a search method. When invoked, it utilizes Meta’s FAISS library to conduct similarity searches within the vector store.

import os
import fnmatch


class Retriever:
   """Sentence embedding based Retrieval Based Augmented generation.
       Given database of pdf files, retriever finds num_retrieved_docs relevant documents"""
   def __init__(self, num_retrieved_docs=3, pdf_folder_path='/content/'):
       # load documents
       pdf_files = [file for file in os.listdir(pdf_folder_path) if fnmatch.fnmatch(file, '*.pdf')]
       loaders = [PyPDFLoader(pdf_file) for pdf_file in pdf_files]
       all_documents = []
       for loader in loaders:
           data = loader.load()
           # Split your data up into smaller documents with Chunks
           text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
           documents = text_splitter.split_documents(data)
           all_documents.extend(documents)
       # create a vectorstore database
       embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") # fast model with competitive perfomance
       self.db = FAISS.from_documents(all_documents, embeddings)
       self.retriever = self.db.as_retriever(search_kwargs={"k": num_retrieved_docs})


   def search(self, query):
       # retrieve top k similar documents to query
       docs = self.retriever.get_relevant_documents(query)
       return docs

We will also define a method to put together the steps:

def generate_reply(query):
   related_docs = retriever.search(query)
   print('related docs', related_docs)
   reply = chatbot.reply(query, related_docs)
   return reply

Outcome

To test the system’s capabilities, we provided a pdf document from AEMO for the model to process, we then posed a question regarding the dispatchability of power systems. As demonstrated below, the model showcases its ability to draw insights from the PDF content and generate articulate responses.

The output above was generated using my local machine, when I ran it on colab the result is actually somewhat different.

Answer:

Step 1/2
Dispatchability of the power system refers to the ability of the system to respond to changes in demand and supply in a timely and efficient manner. It is a measure of the system's ability to meet the changing needs of the consumers and producers of electricity. Dispatchability is achieved through the use of various technologies, such as demand response, distributed generation, and energy storage systems. Demand response refers to the ability of consumers to adjust their energy consumption in response to changes in the price or availability of electricity. Distributed generation refers to the use of small-scale power generation units, such as solar panels or wind turbines, to provide electricity to local areas. Energy storage systems, such as batteries, can store excess energy generated by renewable sources and release it when needed.

Step 2/2
Overall, dispatchability of the power system is critical for ensuring reliable and affordable electricity supply. It allows the system to respond quickly and efficiently to changes in demand and supply, and to meet the needs of consumers and producers.

Conclusion

The release of open-source GEN AI models like Google’s Gemma paves the way for in-house GEN AI applications, offering businesses unparalleled control over their data and a means to comply with stringent regulatory standards. When combined with the RAG framework, this approach becomes a powerful tool for knowledge extraction and a significant enhancer of productivity.

A Quick Experiment on Building Your Own GEN AI Application Utilising RAG and Google’s Gemma

Introduction

Experimentation

Outcome

Conclusion

Written by George Wen