LlamaIndex : Create, Save & Load Indexes, Customize LLMs, Prompts & Embeddings

Yashwanth Reddy
6 min readJul 17, 2023

--

LlamaIndex: it is used to connect your own private proprietary data to let’s say LLM
Current Version:0.6.37
Llamaindex:Call as Lama Index

How do we create the Llama Index & how do we query it right-> so how dowe make sure that you know ;et’s say you use some kind of a PDF document & build an index where you know ask some questions with resp to a documents & we talk about how do we save the index & load it somewhere else
let’s you train it here & you want to use it on your server ec2/vm. so you need to know how to save your index and load it again

how do we customize the llm -> so by default you know it uses open ai gpt3 model & take davince 003 what if you want to use gpt3.5/ gpt 4.0 so we’re going to see how to customize the llm & other thing is how do we customize the prompt-> since you know the basic use case is actually you know you can get answers from the document but what if you want to change some prompt you want to give some diffrent instruction so that’s what we will see the customizing the prompt right and then we will also see you know
how do we customize the embedding what if i want to use hugging face embedding(open source embedding) so those are the couple of things that you always came across you know when you are building some ral projects so that’s what we’re going to see we’re going to see each of this particular point

What document are using here ?
here i’m am using snippet of a book-> THE ALMANCK OF NAVAL RAVIKANT

At its core, LlamaIndex contains a toolkit designed to easily connect LLM’s with your external data.

  1. Creating and Quering Index
  2. Saving and Loading Index
  3. Customize LLM
  4. Customize Prompt
  5. Customize Embedding

Prerequisites

Before we begin, ensure that you have installed the necessary Python packages. We use LlamaIndex, PyPDF to handle PDF files, and Sentence Transformers to create embeddings. You can install these packages by running the following command:

!pip install llama-index pypdf sentence_transformers -q

By default, we use the OpenAI GPT-3 text-davinci-003 model.

https://gpt-index.readthedocs.io/en/v0.6.37/getting_started/installation.html

Next, you will need an OpenAI API key to access their GPT-3 models. Ensure to replace the empty string with your OpenAI key:

import os
import openai
openai.api_key = "" # Replace with your OpenAI API key
os.environ["OPENAI_API_KEY"] = "" # Replace with your OpenAI API key

1)Creating Index

https://gpt-index.readthedocs.io/en/latest/guides/primer/index_guide.html

(or)

Creating and Querying Index

With the prerequisites out of the way, let’s dive into LlamaIndex. First, we’ll create an index using a document set and then query it. In this example, we assume that we have a directory called ‘book’ containing our documents.

The VectorStoreIndex.from_documents() function takes our loaded documents and creates an index. We then create a query engine from this index using the as_query_engine() function. The query engine allows us to ask questions about our indexed documents and get responses based on the content of the documents.

from llama_index import VectorStoreIndex, SimpleDirectoryReader

# Load documents from a directory
documents = SimpleDirectoryReader('book').load_data()

# Create an index from the documents
index = VectorStoreIndex.from_documents(documents)

# Create a query engine from the index
query_engine = index.as_query_engine()

# Query the engine
response = query_engine.query("What is this text about?")
print(response)

o/p:This text is about the dangers of lusting after money and how it can occupy one’s mind and lead to paranoia and fear. It also discusses how Naval Ravikant combined his vocation and avocation to make money in a way that felt like play.

response = query_engine.query("who is this text about?")
print(response)

o/p:This text is about Naval Ravikant.


response = query_engine.query("when was this book published")
print(response)

o/P :This book was published in 2020.


response = query_engine.query("list 5 important points from this book")
print(response)

o/P:

  1. Understand How Wealth Is Created
    2. Find and Build Specific Knowledge
    3. Play Long-Term Games with Long-Term People
    4. Take on Accountability
    5. Build or Buy Equity in a Business
response = query_engine.query("what naval says about wealth creation")
print(response)

o/p: Naval Ravikant says that wealth creation is possible through ethical means. He suggests seeking wealth instead of money or status, and suggests that one should own equity in a business in order to gain financial freedom. He also suggests that one should give society what it wants but does not yet know how to get, and that this should be done at scale. He further states that money is how we transfer wealth, and that wealth is assets that earn while you sleep.

2)Saving and Loading Index

LlamaIndex allows you to save an index for later use. This is particularly helpful when dealing with large document sets where creating an index can take considerable time. Let’s see how to save and load an index:

# Persist index to disk
index.storage_context.persist("naval_index")

from llama_index import StorageContext, load_index_from_storage

# Rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="naval_index")

# Load index from the storage context
new_index = load_index_from_storage(storage_context)

new_query_engine = new_index.as_query_engine()
response = new_query_engine.query("who is this text about?")
print(response)

o/P: This text is about Naval Ravikant

Here, we’ve saved our index to a directory called “naval_index”. Later, we can rebuild our storage context and load the index from it.

3)Customizing LLM’s

One of the powerful features of LlamaIndex is the ability to customize the underlying LLM. In this example, we’ll use LangChain’s ChatOpenAI model and customize its prediction.

(or)

Customizing LLM’s

https://gpt-index.readthedocs.io/en/latest/how_to/customization/service_context.html

from llama_index import LLMPredictor, ServiceContext

from langchain.chat_models import ChatOpenAI

llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"))


service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)


custom_llm_index = VectorStoreIndex.from_documents(
documents, service_context=service_context
)

custom_llm_query_engine = custom_llm_index.as_query_engine()
response = custom_llm_query_engine.query("who is this text about?")
print(response)

o/p: This text is about Naval Ravikant.

The LLMPredictor allows us to utilize different language models and change their parameters.

4)Custom Prompt

By creating a custom prompt, we can provide more structured questions and responses. This allows us to guide the language model to give more specific answers.

from llama_index import Prompt

# Define a custom prompt
template = (
"We have provided context information below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, please answer the question and each answer should start with code word AI Demos: {query_str}\n"
)
qa_template = Prompt(template)

# Use the custom prompt when querying
query_engine = custom_llm_index.as_query_engine(text_qa_template=qa_template)
response = query_engine.query("who is this text about?")
print(response)

o/p: AI Demos: This text is about Naval Ravikant.

This provides a more structured conversation with the LLM, which can be helpful in certain use cases.

5)Custom Embedding

LlamaIndex also allows us to customize the embeddings used in our index. This can be helpful if you want to use a specific embedding model or if the default embeddings do not provide satisfactory results.

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index import LangchainEmbedding, ServiceContext

# Load in a specific embedding model
embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2'))

# Create a service context with the custom embedding model
service_context = ServiceContext.from_defaults(embed_model=embed_model)

# Create an index using the service context
new_index = VectorStoreIndex.from_documents(
documents,
service_context=service_context,
)

query_engine = new_index.as_query_engine()
response = query_engine.query("list 5 important points from this book")
print(response)

o/p: 1. Building wealth and being happy are skills that can be learned.
2. The Almanack of Naval Ravikant is a collection of Naval’s wisdom and experience from the last ten years.
3. This book provides insight into Naval’s principles for building wealth and creating long-term happiness.
4. This book is available for free download in pdf and e-reader versions on Navalmanack.com.
5. Eric Jorgenson is a product strategist and writer who joined the founding team of Zaarly.


query_engine = new_index.as_query_engine()
response = query_engine.query("what naval says about wealth creation")
print(response)

o/p:Naval Ravikant says that wealth creation is not a one-time thing, but a skill that needs to be learned. He suggests asking yourself if what you are doing is authentic to you and if you are productizing, scaling, and using labor, capital, code, or media to do so. He also states that money is a way to transfer wealth, and that wealth is assets that can earn while you sleep, such as businesses, factories, robots, computer programs, and even houses that can be rented out.

We’ve used the sentence-transformers/all-MiniLM-L6-v2 embedding model, but you could use any model that suits your requirements.

And that’s a wrap! We’ve explored various functionalities of the LlamaIndex toolkit, and I hope it helps you in building and customizing your search engine.

--

--

Yashwanth Reddy

👉 Check out my daily newsletter to learn something new about Python and Data Science every day|