LLAMA INDEX | LLMS | RAG | GPT | PLAM
Enhancing RAG Efficiency through LlamaIndex Techniques
LLAMA INDEX AND RAG BASICS WITH DETAILED EXPLANATION
๐๐ฆ๐ญ๐ค๐ฐ๐ฎ๐ฆ ๐ต๐ฐ ๐ต๐ฉ๐ฆ ๐ธ๐ฐ๐ณ๐ญ๐ฅ ๐ฐ๐ง ๐๐ญ๐ข๐ฎ๐ข๐๐ฏ๐ฅ๐ฆ๐น, ๐ข ๐ฑ๐ฐ๐ธ๐ฆ๐ณ๐ง๐ถ๐ญ ๐ฅ๐ข๐ต๐ข ๐ง๐ณ๐ข๐ฎ๐ฆ๐ธ๐ฐ๐ณ๐ฌ ๐ต๐ฉ๐ข๐ต ๐ช๐ด ๐ณ๐ฆ๐ท๐ฐ๐ญ๐ถ๐ต๐ช๐ฐ๐ฏ๐ช๐ป๐ช๐ฏ๐จ ๐ต๐ฉ๐ฆ ๐ธ๐ข๐บ ๐ธ๐ฆ ๐ช๐ฏ๐ต๐ฆ๐ณ๐ข๐ค๐ต ๐ธ๐ช๐ต๐ฉ ๐ฅ๐ข๐ต๐ข, ๐ธ๐ฉ๐ฆ๐ณ๐ฆ ๐ช๐ฏ๐ง๐ฐ๐ณ๐ฎ๐ข๐ต๐ช๐ฐ๐ฏ ๐ช๐ด๐ฏโ๐ต ๐ซ๐ถ๐ด๐ต ๐ฑ๐ณ๐ฐ๐ค๐ฆ๐ด๐ด๐ฆ๐ฅ ๐ฃ๐ถ๐ต ๐ถ๐ฏ๐ฅ๐ฆ๐ณ๐ด๐ต๐ฐ๐ฐ๐ฅ.
๐๐ถ๐ต ๐ธ๐ฉ๐ข๐ต ๐ฅ๐ฐ๐ฆ๐ด ๐ช๐ต ๐ฅ๐ฐ, ๐บ๐ฐ๐ถ ๐ข๐ด๐ฌ? ๐๐ฆ๐ญ๐ญ, ๐๐ญ๐ข๐ฎ๐ข๐๐ฏ๐ฅ๐ฆ๐น ๐ฉ๐ฆ๐ญ๐ฑ๐ด ๐บ๐ฐ๐ถ ๐ช๐ฏ๐จ๐ฆ๐ด๐ต, ๐ด๐ต๐ณ๐ถ๐ค๐ต๐ถ๐ณ๐ฆ, ๐ข๐ฏ๐ฅ ๐ข๐ค๐ค๐ฆ๐ด๐ด ๐บ๐ฐ๐ถ๐ณ ๐ฅ๐ข๐ต๐ข ๐ช๐ฏ ๐ข ๐ธ๐ข๐บ ๐ต๐ฉ๐ข๐ตโ๐ด ๐ฆ๐ข๐ด๐บ ๐ข๐ฏ๐ฅ ๐ฆ๐ง๐ง๐ช๐ค๐ช๐ฆ๐ฏ๐ต. ๐๐ตโ๐ด ๐ญ๐ช๐ฌ๐ฆ ๐ข ๐ฃ๐ณ๐ช๐ฅ๐จ๐ฆ ๐ต๐ฉ๐ข๐ต ๐ค๐ฐ๐ฏ๐ฏ๐ฆ๐ค๐ต๐ด ๐บ๐ฐ๐ถ๐ณ ๐ฅ๐ข๐ต๐ข ๐ต๐ฐ ๐ต๐ฉ๐ฆ ๐ฑ๐ฐ๐ธ๐ฆ๐ณ๐ง๐ถ๐ญ ๐ธ๐ฐ๐ณ๐ญ๐ฅ ๐ฐ๐ง ๐๐๐๐ด.
๐๐ฉ๐ฆ ๐ฎ๐ข๐ช๐ฏ ๐ฑ๐ถ๐ณ๐ฑ๐ฐ๐ด๐ฆ ๐ฐ๐ง ๐๐ญ๐ข๐ฎ๐ข๐๐ฏ๐ฅ๐ฆ๐น ๐ช๐ด ๐ต๐ฐ ๐ฃ๐ณ๐ช๐ฅ๐จ๐ฆ ๐ต๐ฉ๐ฆ ๐จ๐ข๐ฑ ๐ฃ๐ฆ๐ต๐ธ๐ฆ๐ฆ๐ฏ ๐ต๐ฉ๐ฆ ๐ท๐ข๐ด๐ต ๐ข๐ฎ๐ฐ๐ถ๐ฏ๐ต ๐ฐ๐ง ๐ฑ๐ถ๐ฃ๐ญ๐ช๐ค๐ญ๐บ ๐ข๐ท๐ข๐ช๐ญ๐ข๐ฃ๐ญ๐ฆ ๐ฅ๐ข๐ต๐ข ๐ต๐ฉ๐ข๐ต ๐๐๐๐ด ๐ข๐ณ๐ฆ ๐ต๐ณ๐ข๐ช๐ฏ๐ฆ๐ฅ ๐ฐ๐ฏ ๐ข๐ฏ๐ฅ ๐บ๐ฐ๐ถ๐ณ ๐ด๐ฑ๐ฆ๐ค๐ช๐ง๐ช๐ค ๐ฅ๐ข๐ต๐ข, ๐ธ๐ฉ๐ช๐ค๐ฉ ๐ค๐ฐ๐ถ๐ญ๐ฅ ๐ฃ๐ฆ ๐ฑ๐ณ๐ช๐ท๐ข๐ต๐ฆ ๐ฐ๐ณ ๐ด๐ฑ๐ฆ๐ค๐ช๐ง๐ช๐ค ๐ต๐ฐ ๐ต๐ฉ๐ฆ ๐ฑ๐ณ๐ฐ๐ฃ๐ญ๐ฆ๐ฎ ๐บ๐ฐ๐ถโ๐ณ๐ฆ ๐ต๐ณ๐บ๐ช๐ฏ๐จ ๐ต๐ฐ ๐ด๐ฐ๐ญ๐ท๐ฆ. ๐๐ฉ๐ช๐ด ๐ช๐ด ๐ข๐ค๐ฉ๐ช๐ฆ๐ท๐ฆ๐ฅ ๐ต๐ฉ๐ณ๐ฐ๐ถ๐จ๐ฉ ๐ข ๐ฑ๐ณ๐ฐ๐ค๐ฆ๐ด๐ด ๐ค๐ข๐ญ๐ญ๐ฆ๐ฅ ๐๐ฆ๐ต๐ณ๐ช๐ฆ๐ท๐ข๐ญ-๐๐ถ๐จ๐ฎ๐ฆ๐ฏ๐ต๐ฆ๐ฅ ๐๐ฆ๐ฏ๐ฆ๐ณ๐ข๐ต๐ช๐ฐ๐ฏ (๐๐๐).
So, why should you use LlamaIndex?
Well, imagine you have a wealth of data specific to your business or project. This could be anything from customer feedback to scientific research. While LLMs are trained on publicly available material like Wikipedia or textbooks, they are unaware of your particular data. Your data could be locked away in APIs, SQL databases, or even trapped in PDFs and slide decks.
This is where LlamaIndex comes in. It connects to your data sources, whether theyโre APIs, SQL databases, or even PDFs, and adds your data to the mix. This process, known as Retrieval-Augmented Generation (RAG), RAG enables you to use LLMs to query your data, transform it, and generate new insights. You can ask questions about your data, create chatbots, build semi-autonomous agents, and more.
But thatโs not all. LlamaIndex provides a number of resources to assist. It has data connectors to ingest your data, data indexes to structure it, and engines to provide natural language access to it. You can even create chatbots or build semi-autonomous agents.
๐๐๐ซ๐โ๐ฌ ๐ก๐จ๐ฐ ๐ข๐ญ ๐ฐ๐จ๐ซ๐ค๐ฌ:
โค ๐๐จ๐๐ฎ๐ฆ๐๐ง๐ญ ๐๐ซ๐๐ง๐ฌ๐๐จ๐ซ๐ฆ๐๐ญ๐ข๐จ๐ง: You feed LlamaIndex with a bunch of documents. It then transforms each one into a unique vector.
โค ๐๐ก๐ ๐๐ง๐๐๐ฑ: All these vectors are stored in a special place called an index. Think of it as a digital filing cabinet, but super organized.
โค ๐๐ก๐ ๐๐๐๐ซ๐๐ก: When you need to find something, you just ask LlamaIndex. It takes your question, turns it into a vector, and then starts its search.
โค ๐๐๐๐ญ๐จ๐ซ ๐๐ข๐ฆ๐ข๐ฅ๐๐ซ๐ข๐ญ๐ฒ: LlamaIndex looks for similar vectors in the index. The more similar the vectors, the closer they are to each other. This is how it finds the most relevant documents for your query.
After introducing LlamaIndex, you might be wondering,
For whom, exactly, is this tool intended?
The great thing about LlamaIndex is that it caters to a wide range of users, from beginners to advanced users, and everyone in between.
Now, letโs see how you can use LlamaIndex in your code.
First things first, you need to install the library. You can do this by running the following command in your terminal:
pip install llama-index
If youโre a beginner, donโt worry! LlamaIndex provides a high-level API that allows you to ingest and query your data in just 5 lines of code. Yes, you read that right, just 5 lines! Hereโs a simple example:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Who owns the shop?")
print(response)
In this example, weโre using the SimpleDirectoryReader to load data from a directory named โdataโ. We then create an index from these documents using VectorStoreIndex. This index is used to quickly find answers to questions. The query_engine.query function is then used to ask a question - in this case, "Who owns the shop?" and the response is printed out. Itโs as simple as that!
But what if youโre an advanced user looking for more customisation?LlamaIndex offers lower-level APIs that allow you to customize and extend any module to fit your specific needs. Here are some of the most common ways you might want to customize it:
Parsing Documents into Smaller Chunks: In the context of LlamaIndex, a โchunkโ is a smaller piece of a larger document. Think of it like a slice of a pie. When you have a large document, itโs often easier to work with if you break it down into smaller, more manageable pieces โ these pieces are what we call โchunksโ & the process of breaking down documents into chunks is known as โparsingโ.
from llama_index import ServiceContext
service_context = ServiceContext.from_defaults(chunk_size=1000)
In this example, weโre setting the chunk size to 1000. This means that the document will be divided into chunks, each containing 1000 tokens. A token can be as short as one character or as long as one word.
The ServiceContext is a bundle of services and configurations used across a LlamaIndex pipeline. You can then use this service_context when creating your index:
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
Next, letโs say you want to use a different vector store. You can do this by creating a StorageContext with a specified vector_store:
import chromadb
from llama_index.vector_stores import ChromaVectorStore
from llama_index import StorageContext
chroma_client = chromadb.PersistentClient()
chroma_collection = chroma_client.create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
The StorageContext defines the storage backend for where the documents, embeddings, and indexes are stored. You can then use this storage_context when creating your index:
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
If you want to retrieve more context when you query, you can do this by passing in similarity_top_k when creating your query engine:
query_engine = index.as_query_engine(similarity_top_k=5)
This configures the retriever to return the top 5 most similar documents (instead of the default of 2).
LlamaIndex allows you to use different LLMs. For example, you can use the PaLM LLM.
from llama_index import ServiceContext
from llama_index.llms import PaLM
service_context = ServiceContext.from_defaults(llm=PaLM())
After applying all the customization you can then use this service_context when creating your query engine:
query_engine = index.as_query_engine(service_context=service_context)
You can use different response modes in LlamaIndex. For example, you can use the tree_summarize response mode.
query_engine = index.as_query_engine(response_mode="tree_summarize")
And finally you can ask questions to your search engine.
# Query the engine
response = query_engine.query("What were their favorite childhood activities?")
print(response)
You can also use LlamaIndex as a chatbot instead of a Q&A system.
query_engine = index.as_chat_engine()
response = query_engine.chat("What were their favorite childhood activities?")
print(response)
response = query_engine.chat("Oh interesting, tell me more.")
print(response)
So overall LlamaIndex is a powerful and flexible tool for indexing and querying text data. Whether youโre building a search engine, a chatbot, or any other application that needs to work with large amounts of text.
Reference: LLAMA INDEX DOCUMENTATION
Had a good time exploring LlamaIndex? Check out my other blogs as well,
Until next time,
Anushka!