In this notebook I continue my journey with advance RAG or different methods of RAG. This notebook covers some of the topics mentioned in following training- https://www.deeplearning.ai/short-courses/advanced-retrieval-for-ai/

Here I use LlamaIndex as and when required and Llama 2 models — ir fully local implementation. Input to this is my blogpost contents and I will use 2–4 queries all across to see the quality of answers.

Imports: Importing bunch of modules and packages that we will be needing

import os
os.environ['CURL_CA_BUNDLE'] = ""
import logging
import sys
import pandas as pd
import numpy as np
import re
import glob
import fitz

from pathlib import Path

from llama_index.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt
from llama_index.embeddings import HuggingFaceEmbedding

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.text_splitter import SentenceSplitter

from llama_index import set_global_service_context
from accelerate import Accelerator

from llama_index import Document

from sentence_transformers import CrossEncoder
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

We will load the document and divide into base nodes and also use sentence window splitter.

logging.basicConfig(stream=sys.stdout, level=logging.INFO)  # Change INFO to DEBUG if you want more extensive logging
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

llm = LlamaCPP(
model_path=r"llama-2-7b-chat.Q8_0.gguf",
temperature=0,
max_new_tokens=500,
context_window=4500,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 60},

# transform inputs into Llama2 format
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
)
accelerator = Accelerator()
llm = accelerator.prepare(llm)
embeddings = HuggingFaceEmbedding()

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text",
)

# base node parser is a sentence splitter
text_splitter = SentenceSplitter(chunk_size=350,
chunk_overlap=150)

# chunk_size - It defines the size of the chunks (or nodes) that documents are broken into when they are indexed by LlamaIndex
service_context = ServiceContext.from_defaults(llm=llm,
embed_model=embeddings,
callback_manager=callback_manager,
text_splitter=text_splitter,
context_window=4000)
set_global_service_context(service_context)
%%time
llm.complete('Tell me something about color Red').text

## OUTPUT -

'Of course! The color red is a vibrant and powerful hue that is often associated with passion, energy, and excitement.
It is often used to evoke feelings of warmth, love, and joy. Red is also a highly visible color, making it a popular choice for stop signs,
fire trucks, and other safety-related items. In art and design, red is often used to draw attention and create a sense of drama and intensity.
Is there anything else you would like to know about the color red?'

Now are sure that the model is loaded and working we will proceed with our document. We will load the document and divide into base nodes and also use sentence window splitter.

documents = SimpleDirectoryReader(
input_files=[r"all_post_in_one.pdf"]
).load_data()
for document in documents:
print('----------------')
print(document.text)


nodes = node_parser.get_nodes_from_documents(documents)
base_nodes = text_splitter.get_nodes_from_documents(documents)


print(len(nodes))
print(len(base_nodes))

## OUTPUT -
4310
479
sentence_index = VectorStoreIndex(nodes)
base_index = VectorStoreIndex(base_nodes)

Trying out a query to make sure all is well so far..

query = "How many cars are mentioned in the document?"

retriever = base_index.as_retriever(similarity_top_k=9)

results = retriever.retrieve(query)
retrieved_documents = []

for document in results:
print(document.get_text())
print('-'*10)
retrieved_documents.append([document.get_text()])

## OUTPUT -
Since it was a holiday, my 
flatmate, Kartik, \xa0and I decided to go to a mall for some shopping . In
the mall, we went to Hamley just for fun. While roaming we landed in the
Die-cast cars section and were surprised to see the collection. Ferrari,
Ford, Merc, Lamborghini and what not. 1:18, 1:24 scaled models. We spent
nearly 40 mins scanning all the models. Today I was sure to pick a car,
but it couldn't be Aventador. It was a couple of days back that I made
arrangements to get the Aventador. \nI liked a couple of Ferrari's there
but still, I wasn't sure about which one to buy. There were dozens of
Ferrari and half a dozen of Lamborghinis. I liked Ferrari California -T
but came to know that it wasn't great for performance. Since I didn't
have any particular favorite model in Ferrari, I decided to buy a
Lamborghini. \xa0There was Huracan, Countach, Murciel ago, Reventon, etc. I
scanned all model and took help of Kartik to finally land on Reventon. It
is only after the purchase that I realised that only 20 of these models
have been made. I still have a long wait to go in terms of Supercar
knowledge.
----------
For the
auto enthusiasts, you already know that only 20 Reventon's have been
made, so what am I talking about? I just bought a scaled die -cast model
of Lamborghini Reventon. Also, this is the first supercar to go into my
collection. Ok, I bought a die -cast car, millions of kids do, what's the
big deal? I don't know about the big deal, it is just that I wanted to
make a permanent note of this day and the story and share it with my near
and dear ones. \nWhen I was a small ki d, I used to like Ferrari \xa0and
BMW's. It is only in recent years that I took notice of Lamborghini and
Aventador became my favorite. I don't have a specific reason to justify
that why Aventador is my fav, I just like the name. I read a bit more
----------
car with my friends and went to Pondicherry -\xa0road-trip-to-
pondicherry. Then I drove in Mumbai -\xa0inch-by-inch. Then came a big
break and I went to Euro and UK trip. There I hired and Drove Volkswagen
Golf - One of the costliest car that I drove till date and also the first
6 speed manual car. In a nutshell - I drove many cars owned by my
friends. I actually learnt on many of those cars - and my friends were
patient enough to teach me good driving. \nFirst Car - Alto 800 -
Chennai\nIn the past para I told that I used to go to office o n cycle and
at times I dream of having a car and go to office in that. Cars used to
go past by me and I used to be on cycle. I used to wonder when will I get
a car and when will I enter the office in that. Finally it happened
yesterday I bought my first ca r and took it to office too. Again there
was a big role played by my office colleagues in various stages of my car
purchase. A Dream Come True. \n- - END - -\nI'll end my post here. I
assure you that there many more dreams of mine that came true and there
was no particular reason to not include them. It is just that I these
items are on top of mind and have been very close to my heart.
----------
There were dozens of
Ferrari and half a dozen of Lamborghinis. I liked Ferrari California -T
but came to know that it wasn't great for performance. Since I didn't
have any particular favorite model in Ferrari, I decided to buy a
Lamborghini. \xa0There was Huracan, Countach, Murciel ago, Reventon, etc. I
scanned all model and took help of Kartik to finally land on Reventon. It
is only after the purchase that I realised that only 20 of these models
have been made. I still have a long wait to go in terms of Supercar
knowledge. \nCar purc hase was the main highlight of the day, thanks to
Kartik too for helping me out. The other good things that happened today
book purchase. Also, it was my Crush ka Birthday today ;) I learned and
played Jingle Bells Guitar Tabs for someone. I turned more po sitive today
as compared to yesterday. I don't see any big deal in this post, you may
ask. Frankly speaking, I don't see a big deal either. I am just trying to
write from a Kids point of view. how would she/he describe the first car
purchase event. I want you to think of something similar events in your
past and some events in 2017 too. Put down your story in the comment
section below. Bring out the Child in you and open up.
----------
Start tuned for net
weeks some really interesting posts coming up.. Keep Commenting.. \nGood
Night and Happy Sunday !!! \nRecommend to read the Next Post -\xa0Road
Trip to Pondicherry \n"), ('Car Drives - So Far', " \nBlog post has become
one of the key aspects of my Bombay visit. Generally, I get to spend an
hour at the Chennai airport after the security clearance and before the
boarding the plane. Earlier I used to read books to kill time and I still
do, but now I spend an equal amount of time blogging too. So what is this
post about? Since past 2 -3 months, I have been thinking about writing on
this topic. Basically, it is just a short summary and/or story associated
with the cars that I have driven so far. Before we start - All these cars
are owned by my friend and none by me yet! \nLet's start with the car on
which I started learning to drive. I was part of BAJA (Off -roading
vehicle competition) in my engineering college. Back then we used to
design an off -roading car to compete in a nationwide competition. The
entire team including the driver were pursuing their degree while working
on the project. So yeah, I felt that I too could have been one of the
drivers, just a small faint desire. Already there were established
drivers in our team who did good justice but it is here that my desire to
----------
There was two proto vehicles in office
and one of them had 96 in the number plate. It was a rare encounter with
that vehicle, generally you don't get to see proto vehicles. My friends
birthday cak e cost 960. I saw a video about a cyclist and he had a
cadence 96. From somewhere I landed on Athens 1 km record - 1:00:896. 96
is special for me, 896 is like blessing. I have encountered numerous 896s
too, and so this record by someone is special for me. \nThese are the
events I thought of capturing. There were other 96 incidences too, but
the above examples are the ones with least probability of occurring and
yet they occurred with 96. I am not sure about the exact year when I
started noticing 96, but I us ed to notice it in college. Next I will just
----------
\nEnjoy the Week and Party Harder \n"),
('96', " \nI walk into McD today with my friend, placed the order and sat
down, waiting for our order number to be called in. Guess what our order
number was, 296. This took me back to days when I used to have too many
encounters with number 96. Whats weird or funny, the other order numbers
were 198, 199, 202, 203 etc. Couple of months I decided to note down all
the 96 co -incidences and write a blog, and finally here it is. I have
changed the content a little. \nSurprisingly, the number of encounters
with 96 reduced in last two months, when I was intentionally looking out
for them. So the events - I realised that o ne of my good office friend
has 96 in his extension number. There was two proto vehicles in office
and one of them had 96 in the number plate. It was a rare encounter with
that vehicle, generally you don't get to see proto vehicles. My friends
birthday cak e cost 960. I saw a video about a cyclist and he had a
cadence 96. From somewhere I landed on Athens 1 km record - 1:00:896. 96
is special for me, 896 is like blessing. I have encountered numerous 896s
too, and so this record by someone is special for me.
----------
Kindly drop in your comments and
your first car drive experience. Thank You !!! \nHappy Driving Friends
!!!\n"), ('Bangalore T rips', " \nFlying from Chennai to Bangalore in
SpiceJet 3426 and this is the first time that I am flying in a twin
propeller aircraft. I had a plan to meet one of my friends in Bangalore
before going to Bombay. I booked my Bombay ticket long back. I was
planning to travel to Bangalore by bus but due to some reasons my plan
changed and I booked a flight. Today morning only I came to know that it
is propeller aircraft, 80 seaters. \nI am seated in row 8 and there are a
----------
Almost 40% lighter than old cycle, 8 Speed,
with Drop bars and so many things. I made many 80 km cycle rides using it
and it doesn't feel a big deal now to own such a road bike. \n- - INTERVAL
- -\nI wrote the above blocks on 3rd November from Airport but didn't
publish the post. It is 24th November and i am adding few blocks which I
feel is applicable here. \n- - CAR - -\nAgain - another book c an be
written on it. To know more about my Car adventures you can go through -
\xa0car-drives-so-far. Owning a car has always been a dream. Some of the
details are in the last post and I am skipping that. Post that I hired a

Trying out a query to make sure all is well so far..

def rag(query, retrieved_documents):

information = "\n\n".join(retrieved_documents)

prompt = f'''[INST]<<SYS>> You are given bunch of posts or part of authors blogpost and asked a query.
Answer the query based on posts context provided and do not bring outside knowledge.
Based on question you can answer between 200 to 500 words. Be conversational.<</SYS>>\n
Below is the context and query:\n
---------------------\n
{information}\n
---------------------\n
Query: {query}\n
Answer the query based on part of blog post provided.
Answer: \n
[/INST]'''

content = llm.complete(prompt).text
return content
query = "How many cars are mentioned in the document?"

results = retriever.retrieve(query)
retrieved_documents = []

for document in results:
# print(document.get_text())
# print('-'*10)
retrieved_documents.append(document.get_text())

output = rag(query=query, retrieved_documents=retrieved_documents)

print(output)

## OUTPUT -

Based on the provided blog post, there are 8 cars mentioned:
1. Ferrari
2. Ford
3. Mercedes
4. Lamborghini
5. Huracan
6. Countach
7. Murcielago
8. Reventon

Now I will run bunch of queries on basic rag or simple retrieve and answer.

%%time
queries = [
"WHat are the car names mentioned in the document?",
"Name the cities the author has been too.",
"Tell me about the fastest marathon time of the author.",
"How many marathon has the author ran ?",
]

responses = []

for query in queries:

results = retriever.retrieve(query)
retrieved_documents = []

for document in results:
retrieved_documents.append(document.get_text())

output = rag(query=query, retrieved_documents=retrieved_documents)

responses.append([query, output])


res = pd.DataFrame(responses)
res.columns = ['Query', 'Response']
for i in range(len(res)):

print('#######################')
print('Query: '+res.Query[i])
print('-----')
print('Response: '+res.Response[i])
## OUTPUT -

#######################
Query: WHat are the car names mentioned in the document?
-----
Response: Based on the provided blog post, the following car names are mentioned:
1. Ferrari
2. Lamborghini
3. Huracan
4. Countach
5. Murcielago
6. Reventon
7. Aventador
8. California -T
9. Golf (mentioned as the most costly car the author drove)

Please note that the blog post does not provide detailed information about each car model, but rather mentions them in the context of the author's personal experiences and interests.
#######################
Query: Name the cities the author has been too.
-----
Response: Based on the blog post provided, the author has been to the following cities:

1. Mumbai
2. Chennai
3. Lonavala
4. Pune
5. Mumbai (again)
6. Lonavala (again)
7. LinavaLo

The author has also mentioned that he will be visiting another city soon, but the name of the city is not specified in the blog post.
#######################
Query: Tell me about the fastest marathon time of the author.
-----
Response: According to the blog post, the author's fastest marathon time is under 3 hours, which they aim to achieve in 2016. In 2015, they missed their target by just one hour.
#######################
Query: How many marathon has the author ran ?
-----
Response: Based on the blog post provided, the author has ran three marathons:

1. First marathon in 2016 January, which the author completed in 5:15 hours.
2. Second marathon in sub-5 hours, which the author did not finish.
3. Third marathon, which the author did not complete.

Therefore, the author has ran a total of three marathons.

Answer to ‘Tell me about the fastest marathon time of the author.’ is wrong.

Answer to ‘How many marathon has the author ran ?’ is half correct — as I did finish my second marathon.

Feel free to play around with prompts and chunk sizes.

augment_query_generated

augment_query_generated — here we pass on the query to LLM and ask it to give a hypthetical answer to the question. We then club the original query and hypothetical answer and then use that to retrieve the new chunks of document. Then we use original query and this new chunk to get our answer.

Basically we are using hypothetical answer to improve our search or retrieval and then answer. It doesn’t always work.

https://cdn-images-1.medium.com/proxy/1*OnyZwCZ35gWrxwRgsZRaEQ.png
augment_query_generated
def augment_query_generated(query):

prompt = f'''[INST]You are a helpful expert assistant. Provide an example answer to the given question,
that might be found in a document like collection of personal blogposts.
question: {query}"
[/INST]'''

content = llm.complete(prompt).text
return content
query = "WHat are the car names mentioned in the document?"

hypothetical_answer = augment_query_generated(query)
joint_query = f"{query} {hypothetical_answer}"

results = retriever.retrieve(joint_query)
retrieved_documents = []

for document in results:
retrieved_documents.append(document.get_text())

output = rag(query=query, retrieved_documents=retrieved_documents)

print(output)
## OUTPUT - 

Based on the provided blog post, the following car names are mentioned:
1. Lamborghini Reventon - mentioned as one of the 20 cars made, with the author expressing desire to own one.
2. Ferrari - mentioned multiple times throughout the post, with the author mentioning that they like Ferrari California-T but it's not great for performance, and they decided to buy a Lamborghini instead.
3. Lamborghini Huracan - mentioned as one of the cars the author scanned in the die-cast car section of a mall.
4. Lamborghini Countach - mentioned as one of the cars the author scanned in the die-cast car section of a mall.
5. Lamborghini Murcielago - mentioned as one of the cars the author scanned in the die-cast car section of a mall.
6. Ferrari California-T - mentioned as a car the author considered but decided to go with a Lamborghini instead.
7. Volkswagen Golf - mentioned as one of the most expensive cars the author drove, and the first 6-speed manual car they drove.

Seems decent and now we can try it on multiple queries and see the response..

%%time
queries = [
"WHat are the car names mentioned in the document?",
"Name the cities the author has been too.",
"Tell me about the fastest marathon time of the author.",
"How many marathon has the author ran ?",
]

responses = []
retriever = base_index.as_retriever(similarity_top_k=5)

for query in queries:

hypothetical_answer = augment_query_generated(query)
joint_query = f"{query} {hypothetical_answer}"

results = retriever.retrieve(joint_query)
retrieved_documents = []

for document in results:
retrieved_documents.append(document.get_text())

output = rag(query=query, retrieved_documents=retrieved_documents)

responses.append([query, output])

res = pd.DataFrame(responses)
res.columns = ['Query', 'Response']
for i in range(len(res)):

print('#######################')
print('Query: '+res.Query[i])
print('-----')
print('Response: '+res.Response[i])

res = pd.DataFrame(responses)
res.columns = ['Query', 'Response']
## OUTPUT - 

#######################
Query: WHat are the car names mentioned in the document?
-----
Response: Based on the provided blog post, the following are the car names mentioned:
1. Lamborghini Reventon
2. Ferrari California-T
3. Huracan
4. Countach
5. Murcielago
6. Aventador
7. Volkswagen Golf (mentioned in the context of the author's UK trip)
#######################
Query: Name the cities the author has been too.
-----
Response: Based on the provided blog post, the author has been to the following cities:

1. Chennai
2. Pondicherry
3. Mahindra World City
4. Mumbai
5. Pondicherry (again)
6. ECR road (a remote region)
7. Tindivanum (a route to Pondicherry)
8. Chengalpattu (a route to Pondicherry)

The author has also mentioned that they have taken night road trips in a car and slept off in it, driven a mid-size car for a long distance, and visited various places such as Zuca (a chocolate shop), Promenade, Xtasy cafe, and the beach.
#######################
Query: Tell me about the fastest marathon time of the author.
-----
Response: Based on the blog post provided, the author's fastest marathon time is not explicitly mentioned. However, the author does mention that they ran their second marathon in just under 5 hours in 2017, and they were all set to set their personal best in their third run in 2018. Unfortunately, the author encountered an injury during their third marathon and had to stop running after 15 kilometers, resulting in them not achieving their personal best time.
#######################
Query: How many marathon has the author ran ?
-----
Response: Based on the blog post provided, the author has ran one marathon so far, which they completed on January 17th, 2020.

Few answers improved and few didn’t. So bit difficult to make a choice.

Also, note here we are making 2 LLM calls to get to our answer.

augment_multiple_query

augment_multiple_query — here — we take original query and ask LLM to generate similar questions (5 here). For each we take 5 document chunks — so we will have 5 for original query and 5*5–25 documents for new generated queries. Then we de-duplicate and use cross entropy score to take the top 5 documents from 30 and use that to get to our answer.

We make first LLM call to generate 5 new questions, then run retrieval 5+1 to get 30 documents — de-duplicate and use cross-entropy and get top 5. Then make second call to LLM.

So, here again we make 2 calls to LLM per query.

https://cdn-images-1.medium.com/proxy/1*GKndrml72bPCdL4vKKgWRQ.png
augment_multiple_query
def augment_multiple_query(query):

prompt = f'''[INST] "Your users are asking questions about Sandeep's blogposts.
Suggest up to five additional related questions to help them find the information they need, for the provided question.
Suggest only short questions without compound sentences. Suggest a variety of questions that cover different aspects of the topic.
Make sure they are complete questions, and that they are related to the original question.
Directly give questions as output and Output one question per line. Do not number or bullet the questions..
question: {query}"
[/INST]'''

content = llm.complete(prompt).text
content = content.split("\n")[-5:]
return content
query = "WHat are the car names mentioned in the document?"
augmented_queries = augment_multiple_query(query)
queries = [query] + augmented_queries
retrieved_documents = []

for query_i in queries:

results = retriever.retrieve(query_i)

for document in results:
retrieved_documents.append(document.get_text())

# Deduplicate the retrieved documents
unique_documents = set()
for document in retrieved_documents:
unique_documents.add(document)

pairs = [[query, doc] for doc in retrieved_documents]
scores = cross_encoder.predict(pairs)

retrieved_documents = [retrieved_documents[i] for i in np.argsort(scores)[::-1][:6].tolist()]
output = rag(query=query, retrieved_documents=retrieved_documents)

print(output)
%%time
queries = [
"WHat are the car names mentioned in the document?",
"Name the cities the author has been too.",
"Tell me about the fastest marathon time of the author.",
"How many marathon has the author ran ?",
]

responses = []
retriever = base_index.as_retriever(similarity_top_k=5)

for query in queries:


augmented_queries = augment_multiple_query(query)

queries = [query] + augmented_queries

retrieved_documents = []

for query_i in queries:

results = retriever.retrieve(query_i)

for document in results:
retrieved_documents.append(document.get_text())

# Deduplicate the retrieved documents
unique_documents = set()
for document in retrieved_documents:
unique_documents.add(document)

pairs = [[query, doc] for doc in retrieved_documents]
scores = cross_encoder.predict(pairs)

retrieved_documents = [retrieved_documents[i] for i in np.argsort(scores)[::-1][:8].tolist()]
output = rag(query=query, retrieved_documents=retrieved_documents)

# print('#######################')
# print('Query: '+query)
# print('-----')
# print('Response: '+ output)

responses.append([query, output])


res = pd.DataFrame(responses)
res.columns = ['Query', 'Response']
for i in range(len(res)):

print('#######################')
print('Query: '+res.Query[i])
print('-----')
print('Response: '+res.Response[i])
## OUTPUT --

#######################
Query: WHat are the car names mentioned in the document?
-----
Response: Based on the provided blog post, the following car names are mentioned:
1. Ferrari
2. Lamborghini Reventon
3. Huracan
4. Countach
5. Murcielago
6. Aventador
7. California-T

These are the car names that the author of the blog post mentions or refers to in the provided text.
#######################
Query: Name the cities the author has been too.
-----
Response: Based on the blog post provided, the author has been to the following cities:

1. Pune: The author mentions that they visited Pune to meet some of their MTech classmates and their sister.
2. Mumbai: The author was born and raised in Mumbai and mentions that Mumbai is where they were nurtured and brought up. They also mention that they will be moving to Chennai for their new job as a researcher.
3. Lonavala: The author mentions that they went to Lonavala for a family function related to marriage and life partner selection.
4. Chennai: The author mentions that they will be moving to Chennai for their new job as a researcher.
5. Ibiza: The author mentions that they traveled to Ibiza for an interview round for their MTech program.
6. UK: The author mentions that they have traveled to the UK multiple times in the last three years for various purposes.
7. Malaysia: The author mentions that they have also traveled to Malaysia multiple times in the last three years.
#######################
Query: Tell me about the fastest marathon time of the author.
-----
Response: Based on the blog post provided, the author's fastest marathon time is not explicitly mentioned. However, the author does mention that they have completed a marathon in under 5 hours, which suggests that their fastest time is likely below 5 hours. Without more information, it's difficult to provide an exact time for the author's fastest marathon.
#######################
Query: How many marathon has the author ran ?
-----
Response: Based on the blog post provided, the author has ran three marathons:

1. First marathon in 5:15 hours
2. Next marathon in sub 5 hours
3. Third marathon did not finish

The author also mentions that they have become a marathon runner after completing their first marathon, and have since ran multiple marathons with varying times.

This one gives much better answer — but may be not always.

Hope you like it and would love to hear from you. How has been your experience with RAG. What are your tips and tricks and challenges that you might have faced if any.

--

--