Introducing the 99P Labs Blog Chatbot: Your Interactive Guide to Our Content

Martin Arroyo
99P Labs
Published in
13 min readFeb 26, 2024

Project Description

The 99P Labs blog has been a valuable resource for both our team and our community, covering a wide array of subjects and providing insightful content. Our blog also serves as a historical record of our engagements. It aims to capture the knowledge we have gained over time. The number of our blog posts continues to grow. This growth makes it increasingly difficult for readers to search and find the information they need. We recognized the potential of Large Language Models (LLMs) to enhance user experiences and address this issue. Consequently, we decided to create a chatbot. This chatbot is capable of answering questions about our blog content. It provides our readers with a more engaging and interactive way to explore our insights, making it easier for them to access the knowledge contained within our blog archive.

We are excited to introduce this feature to our blog — an embedded chatbot. This chatbot, which we have been calling the 99GPT project, is designed to provide answers to questions related to our blog content. We have set the bot to include information from all of our blog posts. Please note that we will be collecting the queries and responses for research purposes. Additionally, since working with large language models is not always predictable, the chatbot may occasionally provide incorrect or nonsensical responses.

Our chatbot is your resource for finding detailed information and exploring our blog content. Whether you have a targeted question or want a broader understanding of a topic, it provides quick access to relevant insights. Your interactions also help us fine-tune our knowledge base for an even better reader experience in the future.

Building 99GPT was not only an opportunity to create a valuable application for our community but also a chance to explore the newest tools and frameworks in the world of LLMs. However, the journey of developing a Retrieval Augmented Generation (RAG) application was far from simple. Most of the tutorials are about setting up the LLM version of “Hello World,” but there isn’t much in the way of documentation for creating a chatbot over many documents. Our blog now has well over 100 entries, so getting this initial version to work was harder than we anticipated. Despite some tutorials suggesting otherwise, we quickly discovered that the process was complex, often hindered by stale documentation, application breaking version changes, and the rapidly evolving landscape of model selection and testing.

In this blog post, we will guide you through our journey of building 99GPT. We’ll discuss the technical considerations we encountered. We’ll also talk about the challenges we faced and the lessons we learned. Our focus will include the complexities of the development process. This encompasses ingesting our blog data and selecting suitable models. Additionally, we’ll describe our approach to designing a querying strategy and the steps to deploying the application. By sharing our journey, we hope to provide valuable insights for those looking to embark on similar projects and contribute to the growing knowledge base surrounding RAG applications.

Considerations for the application

Creating a chatbot that can engage with our private data, such as our blog posts, requires more than simply asking something like ChatGPT questions. We need to develop a process that allows us to ingest our own data into the chatbot’s knowledge base. Since we host our blog on Medium, the first step is to create a process that takes our current blog archive and processes those posts. This is a one-time bulk ingestion process that occurs to process the existing catalog of blog posts.

However, we also require a continuous ingestion process. This process must regularly add new blogs to our knowledge base as they are published. This ensures that the chatbot stays up-to-date with the latest content. The ongoing ingestion process runs on a schedule to keep the knowledge base current.

In addition to ingesting our blog posts, we need to develop the chat application itself. This application takes in a user’s prompt and returns a response from the language model (LLM). The application can be divided into two distinct but linked phases: ingestion and querying.

The ingestion phase, as previously mentioned, involves processing and adding our blog posts to the chatbot’s knowledge base. The querying phase is where the user interacts with the chatbot by asking questions or making requests. The chatbot then uses its knowledge base, which includes our ingested blog posts, to generate a response.

The choice of language models used in the chatbot links these two phases together. The model determines how the data is processed during ingestion and how it is used to generate responses during the querying phase.

Inspiration for this diagram is from Weights & Biases’ “Building LLM-Powered Applications”

Why LlamaIndex?

After recognizing the importance of frameworks like Langchain and LlamaIndex in bridging the gap between raw data and AI-driven interactive applications, our focus shifted to which framework best aligned with the unique demands of the 99GPT project. This decision boiled down to evaluating how each framework could handle our primary challenge: integrating our blog’s archive into a dynamic, responsive chatbot. Among the frameworks considered, LlamaIndex stood out for several compelling reasons that directly addressed our project’s needs.

First, LlamaIndex’s robust data integration capabilities immediately caught our attention. Given the continuous growth of our blog and the need to keep our chatbot’s knowledge base both current and comprehensive, LlamaIndex offered unparalleled ease in ingesting new content and updating existing information. This framework not only allowed us to automate the bulk ingestion of our archive but also facilitated a smooth process for incorporating new posts as they were published. This capability ensured that the chatbot could provide users with the most up-to-date responses, a critical factor for maintaining engagement and relevance.

Additionally, LlamaIndex’s architecture is inherently designed to support the development of applications that converse with specific, text based data sets. This was a perfect match for our use case to create a chatbot that could navigate the nuances of our blog’s content. The framework’s focus on enabling meaningful interactions with custom data sets meant that we could more effectively tailor the chatbot’s responses to reflect the depth and breadth of our blog’s insights.

Transitioning from a broad consideration of available LLM application frameworks to a targeted evaluation of our project’s specific requirements led us to choose LlamaIndex. This decision was grounded in the framework’s ability to provide seamless data integration, coupled with its specialized support for creating conversational applications tailored to unique data sets.

In the next sections, we will delve deeper into the technical considerations and development journey with LlamaIndex, highlighting the challenges we faced and the solutions we implemented. By sharing these insights, we aim to shed light on the practical aspects of building an interactive chatbot that serves as a gateway to our blog’s wealth of knowledge.

Introducing Retrieval Augmented Generation (RAG)

As we transition from the strategic selection of LlamaIndex, it’s time to focus on the Retrieval Augmented Generation (RAG) method, a key technical pillar underpinning the 99GPT project. This section will explore how RAG, much like a sophisticated combination of a librarian and an advanced search system in a vast library, significantly elevates our chatbot’s performance.

RAG intertwines two models: a chat completion model and an embedding model. The initial step in the RAG process is akin to approaching a librarian with a question. Here, the user’s prompt is handed to the embedding model, which functions like the librarian entering your query into a highly advanced search system. This model transforms the prompt into a vector representation, effectively turning your question into a detailed search query.

This query is then used to scour the library’s extensive collection — the equivalent of conducting a similarity search across the blog’s archived posts represented as vector embeddings. Imagine the librarian rapidly filtering through thousands of books to pinpoint the ones most relevant to your question. This step ensures that the search zeroes in on the most pertinent content, mirroring how RAG identifies and retrieves the context most relevant to the user’s inquiry.

With the relevant “books” in hand, the librarian, or in our case, the chat completion model, synthesizes the information. This model, leveraging its vast training, crafts a response that not only answers the user’s question but also enriches it with specific insights drawn from the selected content. The chatbot, powered by RAG, thus delivers responses that are accurate, detailed, and contextually rich, effectively bridging the gap between the user’s curiosity and the vast knowledge embedded within the blog’s archive.

Through this RAG process, our chatbot transforms from a simple query-answer tool into a dynamic conduit for knowledge discovery, offering users a tailored and insightful exploration of our blog’s content. The implementation of RAG within the 99GPT project not only challenges the conventional boundaries of chatbot capabilities but also marks a significant step forward in our journey to make information access more intuitive and meaningful.

Evaluation

Building on our introduction to the RAG methodology and its pivotal role in enhancing the 99GPT project, we now shift our focus to a critical component of developing AI-driven applications: evaluation.

Faithfulness checks if the chatbot’s answers accurately match the source data. It’s important for preventing the chatbot from creating “hallucinations,” or making up facts not in the original content. This keeps the chatbot’s answers reliable and true to the data.

Relevancy ensures the chatbot’s answers directly relate to the user’s questions. This metric confirms that the responses are not only correct but also useful to the user.

LlamaIndex has simplified the process of testing these metrics, allowing us to focus on refining the chatbot based on the outcomes. With LlamaIndex’s help, we can make sure our chatbot delivers precise and applicable answers.

Our continuous effort in the 99GPT project is to enhance and offer a chatbot that consistently meets, if not exceeds, user expectations by providing accurate and insightful assistance, while keeping a strict check on avoiding hallucinated content.

Model Selection

Transitioning from our focus on evaluating the chatbot’s performance through metrics of faithfulness and relevancy, we delve into the critical step of model selection. This stage is fundamental because the choice of models directly influences the chatbot’s ability to process, understand, and respond to user queries effectively.

Model selection is a pivotal moment in developing our application, primarily because it determines how we handle the ingestion and organization of our data. The overall performance hinges not just on selecting an appropriate embedding model but also on ensuring the completion model operates effectively. It’s crucial to choose an embedding model that not only segments and embeds our data accurately but also matches the model we use for querying. Utilizing different models for data ingestion and querying could lead to inconsistencies and degrade the system’s performance.

For our completion model, we opted for GPT 3.5 Turbo, deployed on Azure, a decision driven by its proven reliability and performance. When it came to selecting an embedding model for processing our blog content, we initially experimented with BAAI/bge-base-en embeddings available through the HuggingFace API. However, after rigorous testing focused on faithfulness and relevancy scores, we ultimately chose text-embedding-ada-002. This model demonstrated superior performance in aligning with our criteria, ensuring that our chatbot could efficiently and accurately retrieve and process content from our extensive blog archive.

This careful consideration in model selection underscores our commitment to creating a chatbot that not only understands and interacts with users effectively but also reflects the depth and nuance of our blog’s content. By aligning our choices of embedding and completion models, we set a strong foundation for the chatbot’s ability to deliver precise and relevant responses, enhancing the user experience.

Selecting a vector store

Following our exploration of model selection, we transition to another critical component of our system’s architecture: choosing a vector store. Vector stores are essential for managing high-dimensional vector spaces efficiently, enabling rapid search and retrieval of data. This choice is crucial for the overall performance of our chatbot, particularly in how quickly and accurately it can access the relevant information within our blog’s archive.

In our quest for the ideal vector store, we prioritized three key factors:

  1. Open-source software, ensuring transparency and community-driven improvements.
  2. Strong community support, which often translates to robust documentation and active development.
  3. High performance in retrieval times, to ensure user queries are processed swiftly.

We evaluated several options, narrowing our focus to three main contenders: Weaviate, ChromaDB, and PGVector. ChromaDB caught our interest as a promising in-memory vector database. Its innovative approach and potential were noteworthy, yet the current state of its deployment options suggested it wasn’t ready for our needs. PGVector offered an intriguing blend of relational database capabilities with vector storage, thanks to its integration with Postgres. However, the complexity of its setup, coupled with our lack of need for a relational database component, led us to look elsewhere.

Our search ultimately led us to choose Weaviate. Among the options we considered, Weaviate stood out for its maturity, ease of setup, and the vibrant community supporting it. The availability of diverse deployment options and its satisfactory performance in retrieval times further solidified our decision. Weaviate’s strengths align well with our requirements, making it a fitting backbone for our chatbot’s data retrieval needs, and ensuring our users receive prompt, accurate responses to their queries.

Ingestion Phase

Following our decision to use Weaviate for efficient data handling and retrieval, we delve into the Ingestion Phase, a foundational step for our Retrieval Augmented Generation (RAG) application. This phase is all about preparing our data to work seamlessly with the rest of the system during user queries.

The ingestion process begins with organizing our blog archive into a more manageable format. We compressed the archive into a zip file, ensuring each blog entry was prefaced with its metadata. Initially, we overlooked the importance of this metadata, resulting in subpar query responses. However, incorporating metadata into the vector embeddings greatly enhanced the relevance and accuracy of the results. This metadata includes crucial information such as the blog post’s title, publication date, and tags, which significantly aids in the retrieval process by providing additional context to each piece of content.

Storing both the text chunks and their corresponding vector embeddings in the vector index enables the application to draw upon our bespoke data during the querying phase. It’s worth noting that the specifics of this process might differ based on the chosen vector store. Therefore, it’s essential to consult and follow the guidelines provided for the particular vector store in use.

By refining our approach to include metadata in the ingestion process, we’ve seen a marked improvement in the chatbot’s performance. This step underscores the value of detailed, structured data preparation in enhancing the overall user experience with our chatbot, ensuring users receive precise, contextually rich responses to their inquiries.

Developing a Querying Strategy

With the ingestion phase completed and our data ready for use, our focus shifts towards constructing the chatbot and refining our querying strategy. This step is crucial for enhancing the interaction quality between the chatbot and users, ensuring that responses are not only accurate but also contextually enriched.

Starting with the system prompt provided by LlamaIndex, we initially adhered to the default configuration, which is typically effective for general chatbot applications. However, to tailor our chatbot more closely to our users’ needs, we decided to modify the system prompt. Our goal was to enrich the chatbot’s responses with specifics about the blog content and suggest additional readings. By incorporating elements into the system prompt that encouraged the chatbot to reference the blog’s details and recommend further content, we observed a significant improvement in the relevance and utility of the responses with minimal adjustments.

Looking ahead, there are several advanced strategies we aim to explore to further enhance our chatbot’s querying capabilities. These include experimenting with Weaviate’s Hybrid search feature, which combines keyword and vector search for more nuanced results, exploring query composition techniques for more complex inquiries, and investigating various query routing strategies to optimize response accuracy and relevance.

This ongoing development of our querying strategy represents an important part of our commitment to providing users with a rich, informative interaction experience, leveraging our comprehensive blog content to meet their informational needs effectively.

Creating the UI

Transitioning from optimizing our querying strategy, the next step in our journey involves crafting the user interface (UI) for our chat application. The interface is the bridge between our chatbot and its users, making its design and functionality key to providing an engaging user experience.

For this task, we turned to Streamlit, a powerful tool that allowed us to develop our chat application’s UI efficiently. Streamlit’s simplicity and the effectiveness of its codebase meant we could set up a functional and visually appealing interface by following just a few steps outlined in their guide on creating a chatbot interface. This tutorial offered us a straightforward path to implementing a customizable UI, requiring minimal coding effort to achieve a professional and user-friendly design.

The tutorial from Streamlit served as our blueprint: Streamlit Chatbot Tutorial.

Streamlit’s tutorial was a clear guide for us. It made customizing the UI straightforward, saving us from the complexities of web development.

Thanks to Streamlit, we built a UI that’s both user-friendly and easy to navigate. Our chatbot is now more accessible, allowing users to dive into our blog’s content effortlessly.

This improvement in the chatbot’s UI boosts not just its functionality but also enriches the user experience. It encourages users to engage more deeply and explore our blog’s insights.

Deploying the application

Following the development of our chatbot’s user interface, the next phase in our project involved deploying the application. Deployment is a critical step, as it makes the chatbot accessible to our audience.

We chose the Streamlit community cloud for deployment, attracted by its offer of a cost-free and secure platform to showcase our app. The process was straightforward for the most part, aligning with standard deployment procedures. However, we encountered a unique challenge related to LlamaIndex, our chosen LLM framework.

LlamaIndex requires the download of specific dictionaries for text embedding when it initializes. We ran into permission issues when it tried to perform this step on the Streamlit cloud, as the platform blocked the automatic download of these dictionaries. To overcome this obstacle, we manually downloaded the necessary dictionaries and included them with our application’s files.

This workaround resolved our deployment issue, allowing us to proceed with launching our chatbot on the Streamlit cloud. Aside from this minor issue, the deployment process on Streamlit cloud proved to be smooth and efficient, enabling us to bring our chatbot to our users without significant delays.

Conclusion

Thank you for joining us in creating and launching our 99GPT chatbot. We value your feedback, as it helps improve the chatbot and make it more useful for you.

To stay connected and explore collaboration opportunities, you can:

  • Subscribe to the 99P Labs blog on Medium for new content
  • Connect with us on LinkedIn for updates
  • Email us at research@99plabs.com with any questions or ideas

We appreciate your interest and look forward to growing together as we explore new possibilities in technology and mobility.

--

--

Martin Arroyo
99P Labs

Applied AI Research Engineer @ 99P Labs | Data Analytics Instructor @ COOP Careers