Building Open Source LLM based Chatbots using Llama Index

Iago Modesto Brandão
Poatek
Published in
8 min readDec 19, 2023

Ready to dive into the world of Large Language Models (LLMs) and unlock their transformative potential? In this post, we'll equip you with essential knowledge of LlamaIndex, a LLM framework to easy design a LLM based solution.

What are LLMs?

Large Language Model (LLM)

Firstly, Language Models (LMs) are computational models that have the capability to understand and generate human language.

However, Large Language Models (LLMs) are advanced language models with massive parameter sizes and exceptional learning capabilities [1].

Examples are GPT 3.5 & 4 (ChatGPT), Gemini & Palm 2 (Bard), LLama 2, Claude 2, and so on.

Chatbots and Chatbots providers, powered by LLMs

Key Concepts

Conversational Task

Sample conversation without memory

Task-oriented dialogue systems are designed to complete a specific task on the user’s behalf, such as booking hotels, making a restaurant reservation, or finding products.

The second category mainly focuses on conversing with the user on open-domain topics [2].

Memory

Memory does store information about past interactions, enabling a conversation to refer to information introduced earlier in the conversation [3].

Retrieval Augmented Generation (RAG)

Sample of Conversational RAG

RAG adds custom data to the data LLMs already have access to [4]. This enables LLMs to LLM query similar pieces of documents that are related to the user query and then, considering this as part of the context, return the answer augmented.

Embeddings

Embeddings are used to represent your documents using a sophisticated numerical representation. Embedding models take text as input and return a long list of numbers used to capture the semantics of the text [5].

Vector Store

A vector database is a type of database that indexes and stores vector embeddings for fast retrieval and similarity search, with capabilities like CRUD operations, metadata filtering, and horizontal scaling [6]. Queries are responded to by maximizing the similarity between query embeddings and vector embeddings stored.

Some Open source and proprietary options of Vector Stores

Agent

An agent is an automated decision-maker powered by an LLM that interacts with the world via a set of tools. Agents can take an arbitrary number of steps to complete a given task, dynamically deciding on the best course of action rather than following predetermined steps. This gives it additional flexibility to tackle more complex tasks [7].

Setup

  1. To run a LLM locally, run the following commands. We assume you have Docker installed if you don’t check this page.
docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

2. Also, install these python packages:

pip3 install llama-index==0.8.59
pip3 install openai==0.28.1
pip3 install pypdf==3.17.2

3. To run examples #3 and #5, which use RAG, download a sample PDF. We used the book A History of Rome from Project Gutenberg, and you can download it here.

Code time

Example #1 — Simple completion

Here, we do full-text generation without any memory. If you ask the following questions without feeding the previous answer directly, the LLM will not know previous messages.

from llama_index.llms import Ollama
llm = Ollama(model="mistral")

resp = llm.complete("What did Rome grow? Be concise.")
print(resp)

#Response:
#Rome grew in size, power, and influence during its history,
#becoming one of the greatest empires in world history.
#It was known for its impressive architecture, culture, law, and
#military might.

Example #2 — Simple Conversation

In this snippet, we have a conversational bot with memory. You can ask follow-up questions, and it will be aware of messages from that session.

from llama_index.llms import Ollama
from llama_index import ServiceContext
from llama_index.chat_engine import SimpleChatEngine

llm = Ollama(model="mistral")

service_context = ServiceContext.from_defaults(
llm=llm,
embed_model="local:BAAI/bge-small-en-v1.5",
)

chat_engine = SimpleChatEngine.from_defaults(service_context=service_context)
print(chat_engine.chat("Hi, my name is Mirna"))
#assistant:
#Hi Mirna! It's nice to meet you. What can I assist you with today?

Above, we said a fictional name; below, it still remembers this name.

print(chat_engine.chat("What is my name?"))
#assistant:
#Your name is Mirna. How may I be of further assistance to you?

Example #3 — Simple RAG

In this example, we provide a LLM, an embedding model, set it as a service context (for general usage) and follow the RAG process described in the Key Concepts section. The PDF information is retrieved and the LLM can respond accordingly.

from llama_index import (
ServiceContext,
SimpleDirectoryReader,
StorageContext,
VectorStoreIndex,
set_global_service_context,
)
from llama_index.llms import Ollama

llm = Ollama(model="mistral")

# Reads pdfs at "./" path
documents = (
SimpleDirectoryReader(
input_dir = './',
required_exts = [".pdf"])
.load_data()
)

# ServiceContext is a bundle of commonly used
# resources used during the indexing and
# querying stage
service_context = (
ServiceContext
.from_defaults(
llm=llm,
embed_model="local:BAAI/bge-small-en-v1.5",
chunk_size=300
)
)
set_global_service_context(service_context)

# Node represents a “chunk” of a source Document
nodes = (
service_context
.node_parser
.get_nodes_from_documents(documents)
)

# offers core abstractions around storage of Nodes,
# indices, and vectors
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

# Create the vectorstore index
index = (
VectorStoreIndex
.from_documents(
documents,
storage_context=storage_context,
llm=llm
)
)
query_engine = index.as_query_engine()

# Query the index
query="""What was the role of Quintus Fabius Pictor
at take the Second Punic War?"""
response = query_engine.query(query)
print(response)

#Response:
#Quintus Fabius Pictor played a significant role during the Second Punic War
#as he was one of the Roman historians who wrote about Rome's history.
#He composed a history of Rome from its foundation to his own times, which
#was written in Greek. His work helped to establish the idea that Rome had
#been founded by Aeneas and the exiles from Troy, a narrative that was
#widely accepted in Rome by the end of the third century.

Example #4 — Simple Agent

The LLM agent can be powerful, interacting directly with your Python code, external APIs, operational system, and others. In this example, we provide a sample function that adds two given numbers to 3. If we provide two numbers, let’s say as 1 and 1, the answer will be 5, since it is 1+1+3.

Note: Always provide the docstring and typehints so the ReActAgent class can be aware of the data types required in the inputs and outputs of your function


from llama_index.llms import Ollama
from llama_index.agent import ReActAgent
from llama_index.tools import FunctionTool

def add_numbers_three_fn(a : int, b: int) -> int:
"""Adds two numbers and a constant 3 and returns the result"""
return a+b+3

tools = [
FunctionTool.from_defaults(fn=add_numbers_three_fn)
]

llm = Ollama(model="mistral")
agent = ReActAgent.from_tools(tools, llm=llm, verbose=True)
response = agent.chat("Add the numbers 3 and 2")
print(str(response))

In this example, we did give 2 and 3 as input, so the math was 2+3+3=8. The answer is correct. It was a fancy function, but it could be anything you need.

Example #5 — Conversational RAG

Here, you will do the same from Example #3, using a different method when instantiating the chat_engine object, that isindex.as_chat_engine instead of index.as_query_engine.

Now, you can ask follow-up questions, and the chat engine will be able to handle memory for you.

from llama_index import (
ServiceContext,
SimpleDirectoryReader,
StorageContext,
VectorStoreIndex,
set_global_service_context,
)
from llama_index.llms import Ollama

llm = Ollama(model="mistral")

# Reads pdfs at "./" path
documents = (
SimpleDirectoryReader(
input_dir = './',
required_exts = [".pdf"])
.load_data()
)

# ServiceContext is a bundle of commonly used
# resources used during the indexing and
# querying stage
service_context = (
ServiceContext
.from_defaults(
llm=llm,
embed_model="local:BAAI/bge-small-en-v1.5",
chunk_size=300
)
)
set_global_service_context(service_context)

# Node represents a “chunk” of a source Document
nodes = (
service_context
.node_parser
.get_nodes_from_documents(documents)
)

# offers core abstractions around storage of Nodes,
# indices, and vectors
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

# Create the vectorstore index
index = (
VectorStoreIndex
.from_documents(
documents,
storage_context=storage_context,
llm=llm
)
)

chat_engine = index.as_chat_engine(chat_mode="simple", verbose = True)
response = chat_engine.chat("Hi, my name is Mirna")
print(response)

#Assistant: Hello Mirna! It's great to meet you. What can I help with today?

response = chat_engine.chat("What is my name?")
print(response)

#Assistant: I apologize for the confusion earlier.
#After checking your information, I have determined that
#your name is Mirna. Is there anything else I can assist you with?

# Query the index
query="""What was the role of Quintus Fabius Pictor
at take the Second Punic War? Be concise"""
response = chat_engine.chat(query)
print(response)

#Assistant: Quintus Fabius Pictor played a significant role
#in the Second Punic War, leading Roman forces during the
#Battle of Italy and negotiating the terms of peace with Hannibal.
#He also helped establish the Roman province of Africa.

print(chat_engine.chat("What is my name?"))
#assistant:
#Your name is Mirna. How may I be of further assistance to you?

Final thoughts

LLMs are reshaping AI, but mastering their potential requires high-quality examples. LlamaIndex is a powerful tool to build your conversational LLM bot. Explore resources like RAG, Agents, Fine-tune, and Prompt Engineering to maximize your LLM solutions. Let’s unlock the future of AI together!

References

[1] Chang, Yupeng, et al. “A survey on evaluation of large language models.” arXiv preprint arXiv:2307.03109 (2023).
[2] Zaib, Munazza, Quan Z. Sheng, and Wei Emma Zhang. “A short survey of pre-trained language models for conversational ai-a new age in nlp.” Proceedings of the Australasian computer science week multiconference. 2020.
[3] Memory. Langchain. https://python.langchain.com/docs/modules/memory/
[4] RAG. Llama-Index. https://gpt-index.readthedocs.io/en/stable/getting_started/concepts.html#retrieval-augmented-generation-rag
[5] Embeddings. Llama-Index. https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html#embeddings
[6] What is a Vector Database. Pinecone. https://www.pinecone.io/learn/vector-database/
[7] Agents. Llama-Index. https://docs.llamaindex.ai/en/stable/use_cases/agents.html
[8] Chat Modes. Llama-Index. https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/usage_pattern.html#available-chat-modes
[9] Yao, Shunyu, et al. “React: Synergizing reasoning and acting in language models.” arXiv preprint arXiv:2210.03629 (2022).

--

--

Iago Modesto Brandão
Poatek
Editor for

Passionate by tech and all possibilities, come with us to learn more and develop the next step of the world?