LLAMA INDEX | LLMS | RAG | GPT | PLAM

Enhancing RAG Efficiency through LlamaIndex Techniques

LLAMA INDEX AND RAG BASICS WITH DETAILED EXPLANATION

Anushka sonawane
5 min readJan 31, 2024

๐˜ž๐˜ฆ๐˜ญ๐˜ค๐˜ฐ๐˜ฎ๐˜ฆ ๐˜ต๐˜ฐ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ธ๐˜ฐ๐˜ณ๐˜ญ๐˜ฅ ๐˜ฐ๐˜ง ๐˜“๐˜ญ๐˜ข๐˜ฎ๐˜ข๐˜๐˜ฏ๐˜ฅ๐˜ฆ๐˜น, ๐˜ข ๐˜ฑ๐˜ฐ๐˜ธ๐˜ฆ๐˜ณ๐˜ง๐˜ถ๐˜ญ ๐˜ฅ๐˜ข๐˜ต๐˜ข ๐˜ง๐˜ณ๐˜ข๐˜ฎ๐˜ฆ๐˜ธ๐˜ฐ๐˜ณ๐˜ฌ ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜ช๐˜ด ๐˜ณ๐˜ฆ๐˜ท๐˜ฐ๐˜ญ๐˜ถ๐˜ต๐˜ช๐˜ฐ๐˜ฏ๐˜ช๐˜ป๐˜ช๐˜ฏ๐˜จ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ธ๐˜ข๐˜บ ๐˜ธ๐˜ฆ ๐˜ช๐˜ฏ๐˜ต๐˜ฆ๐˜ณ๐˜ข๐˜ค๐˜ต ๐˜ธ๐˜ช๐˜ต๐˜ฉ ๐˜ฅ๐˜ข๐˜ต๐˜ข, ๐˜ธ๐˜ฉ๐˜ฆ๐˜ณ๐˜ฆ ๐˜ช๐˜ฏ๐˜ง๐˜ฐ๐˜ณ๐˜ฎ๐˜ข๐˜ต๐˜ช๐˜ฐ๐˜ฏ ๐˜ช๐˜ด๐˜ฏโ€™๐˜ต ๐˜ซ๐˜ถ๐˜ด๐˜ต ๐˜ฑ๐˜ณ๐˜ฐ๐˜ค๐˜ฆ๐˜ด๐˜ด๐˜ฆ๐˜ฅ ๐˜ฃ๐˜ถ๐˜ต ๐˜ถ๐˜ฏ๐˜ฅ๐˜ฆ๐˜ณ๐˜ด๐˜ต๐˜ฐ๐˜ฐ๐˜ฅ.

๐˜‰๐˜ถ๐˜ต ๐˜ธ๐˜ฉ๐˜ข๐˜ต ๐˜ฅ๐˜ฐ๐˜ฆ๐˜ด ๐˜ช๐˜ต ๐˜ฅ๐˜ฐ, ๐˜บ๐˜ฐ๐˜ถ ๐˜ข๐˜ด๐˜ฌ? ๐˜ž๐˜ฆ๐˜ญ๐˜ญ, ๐˜“๐˜ญ๐˜ข๐˜ฎ๐˜ข๐˜๐˜ฏ๐˜ฅ๐˜ฆ๐˜น ๐˜ฉ๐˜ฆ๐˜ญ๐˜ฑ๐˜ด ๐˜บ๐˜ฐ๐˜ถ ๐˜ช๐˜ฏ๐˜จ๐˜ฆ๐˜ด๐˜ต, ๐˜ด๐˜ต๐˜ณ๐˜ถ๐˜ค๐˜ต๐˜ถ๐˜ณ๐˜ฆ, ๐˜ข๐˜ฏ๐˜ฅ ๐˜ข๐˜ค๐˜ค๐˜ฆ๐˜ด๐˜ด ๐˜บ๐˜ฐ๐˜ถ๐˜ณ ๐˜ฅ๐˜ข๐˜ต๐˜ข ๐˜ช๐˜ฏ ๐˜ข ๐˜ธ๐˜ข๐˜บ ๐˜ต๐˜ฉ๐˜ข๐˜ตโ€™๐˜ด ๐˜ฆ๐˜ข๐˜ด๐˜บ ๐˜ข๐˜ฏ๐˜ฅ ๐˜ฆ๐˜ง๐˜ง๐˜ช๐˜ค๐˜ช๐˜ฆ๐˜ฏ๐˜ต. ๐˜๐˜ตโ€™๐˜ด ๐˜ญ๐˜ช๐˜ฌ๐˜ฆ ๐˜ข ๐˜ฃ๐˜ณ๐˜ช๐˜ฅ๐˜จ๐˜ฆ ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜ค๐˜ฐ๐˜ฏ๐˜ฏ๐˜ฆ๐˜ค๐˜ต๐˜ด ๐˜บ๐˜ฐ๐˜ถ๐˜ณ ๐˜ฅ๐˜ข๐˜ต๐˜ข ๐˜ต๐˜ฐ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฑ๐˜ฐ๐˜ธ๐˜ฆ๐˜ณ๐˜ง๐˜ถ๐˜ญ ๐˜ธ๐˜ฐ๐˜ณ๐˜ญ๐˜ฅ ๐˜ฐ๐˜ง ๐˜“๐˜“๐˜”๐˜ด.

๐˜›๐˜ฉ๐˜ฆ ๐˜ฎ๐˜ข๐˜ช๐˜ฏ ๐˜ฑ๐˜ถ๐˜ณ๐˜ฑ๐˜ฐ๐˜ด๐˜ฆ ๐˜ฐ๐˜ง ๐˜“๐˜ญ๐˜ข๐˜ฎ๐˜ข๐˜๐˜ฏ๐˜ฅ๐˜ฆ๐˜น ๐˜ช๐˜ด ๐˜ต๐˜ฐ ๐˜ฃ๐˜ณ๐˜ช๐˜ฅ๐˜จ๐˜ฆ ๐˜ต๐˜ฉ๐˜ฆ ๐˜จ๐˜ข๐˜ฑ ๐˜ฃ๐˜ฆ๐˜ต๐˜ธ๐˜ฆ๐˜ฆ๐˜ฏ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ท๐˜ข๐˜ด๐˜ต ๐˜ข๐˜ฎ๐˜ฐ๐˜ถ๐˜ฏ๐˜ต ๐˜ฐ๐˜ง ๐˜ฑ๐˜ถ๐˜ฃ๐˜ญ๐˜ช๐˜ค๐˜ญ๐˜บ ๐˜ข๐˜ท๐˜ข๐˜ช๐˜ญ๐˜ข๐˜ฃ๐˜ญ๐˜ฆ ๐˜ฅ๐˜ข๐˜ต๐˜ข ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜“๐˜“๐˜”๐˜ด ๐˜ข๐˜ณ๐˜ฆ ๐˜ต๐˜ณ๐˜ข๐˜ช๐˜ฏ๐˜ฆ๐˜ฅ ๐˜ฐ๐˜ฏ ๐˜ข๐˜ฏ๐˜ฅ ๐˜บ๐˜ฐ๐˜ถ๐˜ณ ๐˜ด๐˜ฑ๐˜ฆ๐˜ค๐˜ช๐˜ง๐˜ช๐˜ค ๐˜ฅ๐˜ข๐˜ต๐˜ข, ๐˜ธ๐˜ฉ๐˜ช๐˜ค๐˜ฉ ๐˜ค๐˜ฐ๐˜ถ๐˜ญ๐˜ฅ ๐˜ฃ๐˜ฆ ๐˜ฑ๐˜ณ๐˜ช๐˜ท๐˜ข๐˜ต๐˜ฆ ๐˜ฐ๐˜ณ ๐˜ด๐˜ฑ๐˜ฆ๐˜ค๐˜ช๐˜ง๐˜ช๐˜ค ๐˜ต๐˜ฐ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฑ๐˜ณ๐˜ฐ๐˜ฃ๐˜ญ๐˜ฆ๐˜ฎ ๐˜บ๐˜ฐ๐˜ถโ€™๐˜ณ๐˜ฆ ๐˜ต๐˜ณ๐˜บ๐˜ช๐˜ฏ๐˜จ ๐˜ต๐˜ฐ ๐˜ด๐˜ฐ๐˜ญ๐˜ท๐˜ฆ. ๐˜›๐˜ฉ๐˜ช๐˜ด ๐˜ช๐˜ด ๐˜ข๐˜ค๐˜ฉ๐˜ช๐˜ฆ๐˜ท๐˜ฆ๐˜ฅ ๐˜ต๐˜ฉ๐˜ณ๐˜ฐ๐˜ถ๐˜จ๐˜ฉ ๐˜ข ๐˜ฑ๐˜ณ๐˜ฐ๐˜ค๐˜ฆ๐˜ด๐˜ด ๐˜ค๐˜ข๐˜ญ๐˜ญ๐˜ฆ๐˜ฅ ๐˜™๐˜ฆ๐˜ต๐˜ณ๐˜ช๐˜ฆ๐˜ท๐˜ข๐˜ญ-๐˜ˆ๐˜ถ๐˜จ๐˜ฎ๐˜ฆ๐˜ฏ๐˜ต๐˜ฆ๐˜ฅ ๐˜Ž๐˜ฆ๐˜ฏ๐˜ฆ๐˜ณ๐˜ข๐˜ต๐˜ช๐˜ฐ๐˜ฏ (๐˜™๐˜ˆ๐˜Ž).

So, why should you use LlamaIndex?

Well, imagine you have a wealth of data specific to your business or project. This could be anything from customer feedback to scientific research. While LLMs are trained on publicly available material like Wikipedia or textbooks, they are unaware of your particular data. Your data could be locked away in APIs, SQL databases, or even trapped in PDFs and slide decks.

This is where LlamaIndex comes in. It connects to your data sources, whether theyโ€™re APIs, SQL databases, or even PDFs, and adds your data to the mix. This process, known as Retrieval-Augmented Generation (RAG), RAG enables you to use LLMs to query your data, transform it, and generate new insights. You can ask questions about your data, create chatbots, build semi-autonomous agents, and more.

But thatโ€™s not all. LlamaIndex provides a number of resources to assist. It has data connectors to ingest your data, data indexes to structure it, and engines to provide natural language access to it. You can even create chatbots or build semi-autonomous agents.

Credit

๐‡๐ž๐ซ๐žโ€™๐ฌ ๐ก๐จ๐ฐ ๐ข๐ญ ๐ฐ๐จ๐ซ๐ค๐ฌ:

โžค ๐ƒ๐จ๐œ๐ฎ๐ฆ๐ž๐ง๐ญ ๐“๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐š๐ญ๐ข๐จ๐ง: You feed LlamaIndex with a bunch of documents. It then transforms each one into a unique vector.
โžค ๐“๐ก๐ž ๐ˆ๐ง๐๐ž๐ฑ: All these vectors are stored in a special place called an index. Think of it as a digital filing cabinet, but super organized.
โžค ๐“๐ก๐ž ๐’๐ž๐š๐ซ๐œ๐ก:
When you need to find something, you just ask LlamaIndex. It takes your question, turns it into a vector, and then starts its search.
โžค ๐•๐ž๐œ๐ญ๐จ๐ซ ๐’๐ข๐ฆ๐ข๐ฅ๐š๐ซ๐ข๐ญ๐ฒ: LlamaIndex looks for similar vectors in the index. The more similar the vectors, the closer they are to each other. This is how it finds the most relevant documents for your query.

After introducing LlamaIndex, you might be wondering,

For whom, exactly, is this tool intended?

The great thing about LlamaIndex is that it caters to a wide range of users, from beginners to advanced users, and everyone in between.

Now, letโ€™s see how you can use LlamaIndex in your code.

First things first, you need to install the library. You can do this by running the following command in your terminal:

pip install llama-index

If youโ€™re a beginner, donโ€™t worry! LlamaIndex provides a high-level API that allows you to ingest and query your data in just 5 lines of code. Yes, you read that right, just 5 lines! Hereโ€™s a simple example:

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Who owns the shop?")
print(response)

In this example, weโ€™re using the SimpleDirectoryReader to load data from a directory named โ€œdataโ€. We then create an index from these documents using VectorStoreIndex. This index is used to quickly find answers to questions. The query_engine.query function is then used to ask a question - in this case, "Who owns the shop?" and the response is printed out. Itโ€™s as simple as that!

But what if youโ€™re an advanced user looking for more customisation?LlamaIndex offers lower-level APIs that allow you to customize and extend any module to fit your specific needs. Here are some of the most common ways you might want to customize it:

Parsing Documents into Smaller Chunks: In the context of LlamaIndex, a โ€œchunkโ€ is a smaller piece of a larger document. Think of it like a slice of a pie. When you have a large document, itโ€™s often easier to work with if you break it down into smaller, more manageable pieces โ€” these pieces are what we call โ€œchunksโ€ & the process of breaking down documents into chunks is known as โ€œparsingโ€.

from llama_index import ServiceContext
service_context = ServiceContext.from_defaults(chunk_size=1000)

In this example, weโ€™re setting the chunk size to 1000. This means that the document will be divided into chunks, each containing 1000 tokens. A token can be as short as one character or as long as one word.

The ServiceContext is a bundle of services and configurations used across a LlamaIndex pipeline. You can then use this service_context when creating your index:

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

Next, letโ€™s say you want to use a different vector store. You can do this by creating a StorageContext with a specified vector_store:

import chromadb
from llama_index.vector_stores import ChromaVectorStore
from llama_index import StorageContext

chroma_client = chromadb.PersistentClient()
chroma_collection = chroma_client.create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

The StorageContext defines the storage backend for where the documents, embeddings, and indexes are stored. You can then use this storage_context when creating your index:

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

If you want to retrieve more context when you query, you can do this by passing in similarity_top_k when creating your query engine:

query_engine = index.as_query_engine(similarity_top_k=5)

This configures the retriever to return the top 5 most similar documents (instead of the default of 2).

LlamaIndex allows you to use different LLMs. For example, you can use the PaLM LLM.

from llama_index import ServiceContext
from llama_index.llms import PaLM

service_context = ServiceContext.from_defaults(llm=PaLM())

After applying all the customization you can then use this service_context when creating your query engine:

query_engine = index.as_query_engine(service_context=service_context)

You can use different response modes in LlamaIndex. For example, you can use the tree_summarize response mode.

query_engine = index.as_query_engine(response_mode="tree_summarize")

And finally you can ask questions to your search engine.

# Query the engine
response = query_engine.query("What were their favorite childhood activities?")
print(response)

You can also use LlamaIndex as a chatbot instead of a Q&A system.

query_engine = index.as_chat_engine()
response = query_engine.chat("What were their favorite childhood activities?")
print(response)

response = query_engine.chat("Oh interesting, tell me more.")
print(response)

So overall LlamaIndex is a powerful and flexible tool for indexing and querying text data. Whether youโ€™re building a search engine, a chatbot, or any other application that needs to work with large amounts of text.

Reference: LLAMA INDEX DOCUMENTATION

Had a good time exploring LlamaIndex? Check out my other blogs as well,

Until next time,
Anushka!

--

--

Anushka sonawane

Software Developer | Researcher | Machine Learning | Artificial Intelligence | Python