LLAMA INDEX | LLMS | RAG | GPT | PLAM

Enhancing RAG Efficiency through LlamaIndex Techniques

LLAMA INDEX AND RAG BASICS WITH DETAILED EXPLANATION

5 min readJan 31, 2024

𝘞𝘦𝘭𝘤𝘰𝘮𝘦 𝘵𝘰 𝘵𝘩𝘦 𝘸𝘰𝘳𝘭𝘥 𝘰𝘧 𝘓𝘭𝘢𝘮𝘢𝘐𝘯𝘥𝘦𝘹, 𝘢 𝘱𝘰𝘸𝘦𝘳𝘧𝘶𝘭 𝘥𝘢𝘵𝘢 𝘧𝘳𝘢𝘮𝘦𝘸𝘰𝘳𝘬 𝘵𝘩𝘢𝘵 𝘪𝘴 𝘳𝘦𝘷𝘰𝘭𝘶𝘵𝘪𝘰𝘯𝘪𝘻𝘪𝘯𝘨 𝘵𝘩𝘦 𝘸𝘢𝘺 𝘸𝘦 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵 𝘸𝘪𝘵𝘩 𝘥𝘢𝘵𝘢, 𝘸𝘩𝘦𝘳𝘦 𝘪𝘯𝘧𝘰𝘳𝘮𝘢𝘵𝘪𝘰𝘯 𝘪𝘴𝘯’𝘵 𝘫𝘶𝘴𝘵 𝘱𝘳𝘰𝘤𝘦𝘴𝘴𝘦𝘥 𝘣𝘶𝘵 𝘶𝘯𝘥𝘦𝘳𝘴𝘵𝘰𝘰𝘥.

𝘉𝘶𝘵 𝘸𝘩𝘢𝘵 𝘥𝘰𝘦𝘴 𝘪𝘵 𝘥𝘰, 𝘺𝘰𝘶 𝘢𝘴𝘬? 𝘞𝘦𝘭𝘭, 𝘓𝘭𝘢𝘮𝘢𝘐𝘯𝘥𝘦𝘹 𝘩𝘦𝘭𝘱𝘴 𝘺𝘰𝘶 𝘪𝘯𝘨𝘦𝘴𝘵, 𝘴𝘵𝘳𝘶𝘤𝘵𝘶𝘳𝘦, 𝘢𝘯𝘥 𝘢𝘤𝘤𝘦𝘴𝘴 𝘺𝘰𝘶𝘳 𝘥𝘢𝘵𝘢 𝘪𝘯 𝘢 𝘸𝘢𝘺 𝘵𝘩𝘢𝘵’𝘴 𝘦𝘢𝘴𝘺 𝘢𝘯𝘥 𝘦𝘧𝘧𝘪𝘤𝘪𝘦𝘯𝘵. 𝘐𝘵’𝘴 𝘭𝘪𝘬𝘦 𝘢 𝘣𝘳𝘪𝘥𝘨𝘦 𝘵𝘩𝘢𝘵 𝘤𝘰𝘯𝘯𝘦𝘤𝘵𝘴 𝘺𝘰𝘶𝘳 𝘥𝘢𝘵𝘢 𝘵𝘰 𝘵𝘩𝘦 𝘱𝘰𝘸𝘦𝘳𝘧𝘶𝘭 𝘸𝘰𝘳𝘭𝘥 𝘰𝘧 𝘓𝘓𝘔𝘴.

𝘛𝘩𝘦 𝘮𝘢𝘪𝘯 𝘱𝘶𝘳𝘱𝘰𝘴𝘦 𝘰𝘧 𝘓𝘭𝘢𝘮𝘢𝘐𝘯𝘥𝘦𝘹 𝘪𝘴 𝘵𝘰 𝘣𝘳𝘪𝘥𝘨𝘦 𝘵𝘩𝘦 𝘨𝘢𝘱 𝘣𝘦𝘵𝘸𝘦𝘦𝘯 𝘵𝘩𝘦 𝘷𝘢𝘴𝘵 𝘢𝘮𝘰𝘶𝘯𝘵 𝘰𝘧 𝘱𝘶𝘣𝘭𝘪𝘤𝘭𝘺 𝘢𝘷𝘢𝘪𝘭𝘢𝘣𝘭𝘦 𝘥𝘢𝘵𝘢 𝘵𝘩𝘢𝘵 𝘓𝘓𝘔𝘴 𝘢𝘳𝘦 𝘵𝘳𝘢𝘪𝘯𝘦𝘥 𝘰𝘯 𝘢𝘯𝘥 𝘺𝘰𝘶𝘳 𝘴𝘱𝘦𝘤𝘪𝘧𝘪𝘤 𝘥𝘢𝘵𝘢, 𝘸𝘩𝘪𝘤𝘩 𝘤𝘰𝘶𝘭𝘥 𝘣𝘦 𝘱𝘳𝘪𝘷𝘢𝘵𝘦 𝘰𝘳 𝘴𝘱𝘦𝘤𝘪𝘧𝘪𝘤 𝘵𝘰 𝘵𝘩𝘦 𝘱𝘳𝘰𝘣𝘭𝘦𝘮 𝘺𝘰𝘶’𝘳𝘦 𝘵𝘳𝘺𝘪𝘯𝘨 𝘵𝘰 𝘴𝘰𝘭𝘷𝘦. 𝘛𝘩𝘪𝘴 𝘪𝘴 𝘢𝘤𝘩𝘪𝘦𝘷𝘦𝘥 𝘵𝘩𝘳𝘰𝘶𝘨𝘩 𝘢 𝘱𝘳𝘰𝘤𝘦𝘴𝘴 𝘤𝘢𝘭𝘭𝘦𝘥 𝘙𝘦𝘵𝘳𝘪𝘦𝘷𝘢𝘭-𝘈𝘶𝘨𝘮𝘦𝘯𝘵𝘦𝘥 𝘎𝘦𝘯𝘦𝘳𝘢𝘵𝘪𝘰𝘯 (𝘙𝘈𝘎).

So, why should you use LlamaIndex?

Well, imagine you have a wealth of data specific to your business or project. This could be anything from customer feedback to scientific research. While LLMs are trained on publicly available material like Wikipedia or textbooks, they are unaware of your particular data. Your data could be locked away in APIs, SQL databases, or even trapped in PDFs and slide decks.

This is where LlamaIndex comes in. It connects to your data sources, whether they’re APIs, SQL databases, or even PDFs, and adds your data to the mix. This process, known as Retrieval-Augmented Generation (RAG), RAG enables you to use LLMs to query your data, transform it, and generate new insights. You can ask questions about your data, create chatbots, build semi-autonomous agents, and more.

But that’s not all. LlamaIndex provides a number of resources to assist. It has data connectors to ingest your data, data indexes to structure it, and engines to provide natural language access to it. You can even create chatbots or build semi-autonomous agents.

𝐇𝐞𝐫𝐞’𝐬 𝐡𝐨𝐰 𝐢𝐭 𝐰𝐨𝐫𝐤𝐬:

➤ 𝐃𝐨𝐜𝐮𝐦𝐞𝐧𝐭 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧: You feed LlamaIndex with a bunch of documents. It then transforms each one into a unique vector.
➤ 𝐓𝐡𝐞 𝐈𝐧𝐝𝐞𝐱: All these vectors are stored in a special place called an index. Think of it as a digital filing cabinet, but super organized.
➤ 𝐓𝐡𝐞 𝐒𝐞𝐚𝐫𝐜𝐡: When you need to find something, you just ask LlamaIndex. It takes your question, turns it into a vector, and then starts its search.
➤ 𝐕𝐞𝐜𝐭𝐨𝐫 𝐒𝐢𝐦𝐢𝐥𝐚𝐫𝐢𝐭𝐲: LlamaIndex looks for similar vectors in the index. The more similar the vectors, the closer they are to each other. This is how it finds the most relevant documents for your query.

After introducing LlamaIndex, you might be wondering,

For whom, exactly, is this tool intended?
The great thing about LlamaIndex is that it caters to a wide range of users, from beginners to advanced users, and everyone in between.

Now, let’s see how you can use LlamaIndex in your code.

First things first, you need to install the library. You can do this by running the following command in your terminal:

pip install llama-index

If you’re a beginner, don’t worry! LlamaIndex provides a high-level API that allows you to ingest and query your data in just 5 lines of code. Yes, you read that right, just 5 lines! Here’s a simple example:

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Who owns the shop?")
print(response)

In this example, we’re using the SimpleDirectoryReader to load data from a directory named “data”. We then create an index from these documents using VectorStoreIndex. This index is used to quickly find answers to questions. The query_engine.query function is then used to ask a question - in this case, "Who owns the shop?" and the response is printed out. It’s as simple as that!

But what if you’re an advanced user looking for more customisation?LlamaIndex offers lower-level APIs that allow you to customize and extend any module to fit your specific needs. Here are some of the most common ways you might want to customize it:

Parsing Documents into Smaller Chunks: In the context of LlamaIndex, a “chunk” is a smaller piece of a larger document. Think of it like a slice of a pie. When you have a large document, it’s often easier to work with if you break it down into smaller, more manageable pieces — these pieces are what we call “chunks” & the process of breaking down documents into chunks is known as “parsing”.

from llama_index import ServiceContext
service_context = ServiceContext.from_defaults(chunk_size=1000)

In this example, we’re setting the chunk size to 1000. This means that the document will be divided into chunks, each containing 1000 tokens. A token can be as short as one character or as long as one word.

The ServiceContext is a bundle of services and configurations used across a LlamaIndex pipeline. You can then use this service_context when creating your index:

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

Next, let’s say you want to use a different vector store. You can do this by creating a StorageContext with a specified vector_store:

import chromadb
from llama_index.vector_stores import ChromaVectorStore
from llama_index import StorageContext

chroma_client = chromadb.PersistentClient()
chroma_collection = chroma_client.create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

The StorageContext defines the storage backend for where the documents, embeddings, and indexes are stored. You can then use this storage_context when creating your index:

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

If you want to retrieve more context when you query, you can do this by passing in similarity_top_k when creating your query engine:

query_engine = index.as_query_engine(similarity_top_k=5)

This configures the retriever to return the top 5 most similar documents (instead of the default of 2).

LlamaIndex allows you to use different LLMs. For example, you can use the PaLM LLM.

from llama_index import ServiceContext
from llama_index.llms import PaLM

service_context = ServiceContext.from_defaults(llm=PaLM())

After applying all the customization you can then use this service_context when creating your query engine:

query_engine = index.as_query_engine(service_context=service_context)

You can use different response modes in LlamaIndex. For example, you can use the tree_summarize response mode.

query_engine = index.as_query_engine(response_mode="tree_summarize")

And finally you can ask questions to your search engine.

# Query the engine
response = query_engine.query("What were their favorite childhood activities?")
print(response)

You can also use LlamaIndex as a chatbot instead of a Q&A system.

query_engine = index.as_chat_engine()
response = query_engine.chat("What were their favorite childhood activities?")
print(response)

response = query_engine.chat("Oh interesting, tell me more.")
print(response)

So overall LlamaIndex is a powerful and flexible tool for indexing and querying text data. Whether you’re building a search engine, a chatbot, or any other application that needs to work with large amounts of text.

Reference: LLAMA INDEX DOCUMENTATION

Had a good time exploring LlamaIndex? Check out my other blogs as well,

Unlocking the MLOps Secrets: Expertly Navigating Deployment, Maintenance, and Scaling

Hey, tech explorers!

medium.com

Protect Your Python Projects: Avoid Direct setup.py Invocation for Ultimate Code Safeguarding!

It’s time to say goodbye to setup.py complexities and embrace efficient Python packaging with build frontends.

pub.towardsai.net

Until next time,
Anushka!