Llama-Index: A Comprehensive Guide for Building and Querying Document Indexes

7 min readMar 2, 2024

In today’s digital age, the ability to efficiently search and retrieve information from large volumes of text data is crucial for various applications, including tasks like information retrieval, natural language processing, and knowledge management. Document indexing is a technique used to organize and structure textual data, enabling fast and accurate search operations. By indexing documents, we can query the system(textual data) quickly and accurately.

In this comprehensive guide, we will explore the process of building and querying document indexes using the Llama-Index. LlamaIndex is a powerful data framework that provides tools for creating, managing, and querying vector store indexes, which are commonly used for document indexing and retrieval tasks. We will walk through each step of the process, from setting up the environment to executing queries and analyzing the results.

Installation and Setup:

Begin by installing the Llama-Index library using the !pip install llama-index command.
Set the OpenAI API key for authentication purposes.
Import the necessary libraries and modules, including nest_asyncio, to manage asyncio event loops.

!pip install llama-index

import os
import nest_asyncio

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings

Why we need nest_asyncio?

Our codebase employs asynchronous operations extensively for tasks such as loading, querying and processing data. To overcome potential conflicts in environments with nested event loops, we use nest_asyncio to patch asyncio for supporting nested loops, ensuring smooth execution of asyncio-based code. This customization facilitates seamless development experiences. Asynchronous operations include fetching data, building indexes, executing queries, and handling sub-questions. This approach allows background data loading, prevents blocking delays, and ensures concurrent task execution for enhanced efficiency.

# openai key setup
os.environ["OPENAI_API_KEY"] = "sk-......"
nest_asyncio.apply()

While the preferred method for defining keys is by curating a .env file and storing them there, for the sake of simplicity in learning, the example above demonstrates an alternative approach. However, if you choose to create a .env file, the following code snippet can assist you in reading it.

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

openai.api_key  = os.getenv('OPENAI_API_KEY')

# Using the LlamaDebugHandler to print the trace of the sub questions
# captured by the SUB_QUESTION callback event type
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

Settings.callback_manager = callback_manager

Data:

Create a directory to store the data if it doesn’t already exist. If you are using colab and having a zip file use command to unzip the file.!unzip <path of the file>
Load the documents from the specified directory using the SimpleDirectoryReader class.
Store the loaded data in a variable for further processing.
You can download the data pdf from this link. We are utilising Humber College viewbook 2024–25 for this tutorial.

# load data
humberviewbook = SimpleDirectoryReader(input_dir="./data/").load_data()

Build Index and Query Engine:

Use the loaded documents to build a vector store index using the VectorStoreIndex.from_documents() method.
Enable asynchronous processing to improve performance by allowing concurrent execution of tasks.
Convert the vector store index into a query engine for querying purposes.

# build index and query engine
vector_query_engine = VectorStoreIndex.from_documents(
    humberviewbook,
    use_async=True,
).as_query_engine()

**********
Trace: index_construction
    |_node_parsing ->  0.109464 seconds
      |_chunking ->  0.00013 seconds
      |_chunking ->  0.000871 seconds
      |_chunking ->  0.00051 seconds
      |_chunking ->  0.000629 seconds
      |_chunking ->  0.000826 seconds
      |_chunking ->  0.000693 seconds
      |_chunking ->  0.000613 seconds
      |_chunking ->  0.001431 seconds
      |_chunking ->  0.000997 seconds
      |_chunking ->  0.001134 seconds
      |_chunking ->  0.000337 seconds
      |_chunking ->  0.000996 seconds
      |_chunking ->  0.000617 seconds
      |_chunking ->  0.001148 seconds
      |_chunking ->  0.000457 seconds
      |_chunking ->  0.000759 seconds
      |_chunking ->  0.000759 seconds
      |_chunking ->  0.00092 seconds
      |_chunking ->  0.00049 seconds
      |_chunking ->  0.00044 seconds
      |_chunking ->  0.000583 seconds
      |_chunking ->  0.004309 seconds
      |_chunking ->  0.004924 seconds
      |_chunking ->  0.006228 seconds
      |_chunking ->  0.004861 seconds
      |_chunking ->  0.004755 seconds
      |_chunking ->  0.004357 seconds
      |_chunking ->  0.001598 seconds
      |_chunking ->  0.004429 seconds
      |_chunking ->  0.004406 seconds
      |_chunking ->  0.004948 seconds
      |_chunking ->  0.005026 seconds
      |_chunking ->  0.004496 seconds
      |_chunking ->  0.001785 seconds
      |_chunking ->  0.001639 seconds
      |_chunking ->  0.001845 seconds
      |_chunking ->  0.001568 seconds
      |_chunking ->  0.00773 seconds
      |_chunking ->  0.0005 seconds
      |_chunking ->  0.000729 seconds
      |_chunking ->  0.000359 seconds
      |_chunking ->  0.00061 seconds
    |_embedding ->  0.682087 seconds
**********

The use_async=True parameter indicates that the indexing process should be performed asynchronously, which means that it will run in the background while allowing other tasks to proceed concurrently.
The .as_query_engine() method converts the VectorStoreIndex object into a QueryEngine object, which allows for efficient querying and retrieval of documents based on user-defined queries.

Setup Sub Question Query Engine:

# setup base query engine as tool
query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="humber bot",
            description="Humber viewbook 2024-25 query engine",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

This code block initializes a query engine tool named “humber bot” with the provided vector_query_engine as its underlying query engine. The QueryEngineTool class is used to create this tool, specifying its name and description through the metadata parameter.
Then, a SubQuestionQueryEngine is instantiated using the from_defaults method, which sets up the query engine to handle sub-questions. The query_engine_tools parameter is set to the list containing the previously created query engine tool. Additionally, use_async=True indicates that the query engine should operate asynchronously, allowing for concurrent processing of queries and sub-questions.

Run Queries:

Execute sample queries using the prepared query engine.
Print the responses to the queries for analysis and inspection.

response = query_engine.query(
    "What are different courses offered by humber and what housing options humber college provide?"
)

#output
Generated 2 sub questions.
[humber bot] Q: What are the different courses offered by Humber College?
[humber bot] Q: What housing options does Humber College provide?
[humber bot] A: Humber College provides student residence (on-campus) housing for 1,450 students between their two campuses. Additionally, students have the option of living off-campus, where they would need to negotiate directly with the housing provider for prices and conditions.
[humber bot] A: Humber College offers a variety of courses in areas such as Applied Technology & Engineering, Business, Accounting & Management, Children & Youth, Community & Social Services, Creative Arts & Design, Emergency Services, Fashion & Beauty, Foundations & Language Training, Health & Wellness, Hospitality & Tourism, Information, Computer & Digital Technology, International Development, Justice & Legal Studies, Marketing & Advertising, Media & Public Relations, Performing Arts & Music, and Skilled Trades & Apprenticeships. These courses lead to different credentials including Foundation Certificates, Certificates, Apprenticeships, Diplomas and Advanced Diplomas, Bachelor’s Degrees, and Ontario Graduate Certificates.
**********
Trace: query
    |_query ->  8.341954 seconds
      |_llm ->  1.704874 seconds
      |_sub_question ->  3.478777 seconds
        |_query ->  3.475354 seconds
          |_retrieve ->  0.16224 seconds
            |_embedding ->  0.154836 seconds
          |_synthesize ->  3.312606 seconds
            |_templating ->  6.3e-05 seconds
            |_llm ->  3.303099 seconds
      |_sub_question ->  1.676506 seconds
        |_query ->  1.675639 seconds
          |_retrieve ->  0.21419 seconds
            |_embedding ->  0.207365 seconds
          |_synthesize ->  1.461151 seconds
            |_templating ->  6.4e-05 seconds
            |_llm ->  1.455075 seconds
      |_synthesize ->  3.154823 seconds
        |_templating ->  6.8e-05 seconds
        |_llm ->  3.147733 seconds
**********

print(response)

#output
Humber College offers a variety of courses in areas such as Applied Technology 
& Engineering, Business, Accounting & Management, Children & Youth, Community 
& Social Services, Creative Arts & Design, Emergency Services, Fashion & Beauty
, Foundations & Language Training, Health & Wellness, Hospitality & Tourism, 
Information, Computer & Digital Technology, International Development, Justice 
& Legal Studies, Marketing & Advertising, Media & Public Relations, Performing 
Arts & Music, and Skilled Trades & Apprenticeships. These courses lead to 
different credentials including Foundation Certificates, Certificates, 
Apprenticeships, Diplomas and Advanced Diplomas, Bachelor’s Degrees, and 
Ontario Graduate Certificates. Humber College provides student residence 
(on-campus) housing for 1,450 students between their two campuses. 
Additionally, students have the option of living off-campus, where they would 
need to negotiate directly with the housing provider for prices and conditions.

Iterate Through Sub-Question Items:

Retrieve and print sub-questions captured during the query execution using the llama_debug handler.
Gain insights into the sub-questions generated during the query process, along with their corresponding answers.

# iterate through sub_question items captured in SUB_QUESTION event
from llama_index.core.callbacks import CBEventType, EventPayload

for i, (start_event, end_event) in enumerate(
    llama_debug.get_event_pairs(CBEventType.SUB_QUESTION)
):
    qa_pair = end_event.payload[EventPayload.SUB_QUESTION]
    print("Sub Question " + str(i) + ": " + qa_pair.sub_q.sub_question.strip())
    print("Answer: " + qa_pair.answer.strip())
    print("====================================")

#output
Sub Question 0: What are the different courses offered by Humber?
Answer: Humber offers a wide range of courses in areas such as Applied Technology & Engineering, Business, Accounting & Management, Children & Youth, Community & Social Services, Creative Arts & Design, Emergency Services, Fashion & Beauty, Foundations & Language Training, Health & Wellness, Hospitality & Tourism, Information, Computer & Digital Technology, International Development, Justice & Legal Studies, Marketing & Advertising, Media & Public Relations, Performing Arts & Music, and Skilled Trades & Apprenticeships. These courses lead to various credentials including Foundation Certificates, Certificates, Apprenticeships, Diplomas and Advanced Diplomas, Bachelor’s Degrees, and Ontario Graduate Certificates.
====================================
Sub Question 1: What housing options does Humber College provide?
Answer: Humber College provides student residence (on-campus) housing for 1,450 students between their two campuses. Additionally, students have the option of living off-campus, where they would need to negotiate directly with the housing provider for prices and conditions.
====================================

For each pair of events, llama_debug handler extracts the sub-question and its associated answer from the end_event payload using the EventPayload.SUB_QUESTION key.

Codebase — https://colab.research.google.com/drive/15vue2322fJP47YGPMqOGpPs1HM3yBXD0?usp=sharing

In the upcoming tutorials, we’ll delve into various language models (LLMs) and diverse datasets to tackle the challenges encountered in working with retrieval-augmented generation (RAG) systems. Challenges include maintaining data quality, optimizing chunk size, and aligning user queries with indexed data. Our solutions will involve employing diverse search methods like full-text search and structured queries to complement vector similarity search. We’ll also explore varying embedding models to address semantic nuances and implement techniques such as context-aware and hierarchical retrieval for efficient information retrieval. Additionally, we’ll discuss how reranking, scoring, and information compression can improve generation quality and readability. Efficiency enhancements through smaller models, parallel processing, selective generation, and caching will be explored to ensure improved response time and user experience. These comprehensive strategies aim to enhance accuracy, efficiency, and user satisfaction in RAG pipelines.

Stay Tuned!