Building LLM-based Application Using Langchain and OpenAI

Published in

Brocoders Team

13 min readJul 19, 2023

Learn about the dev team’s first experience of building a context-aware AI chatbot based on LLM. Here I will tell how we prepared a knowledge database from various documents, compared index types, switched from Lamaindex to Langchain, and experimented with prompts, inputs and LLMs.

The Situation

The client had requested my team to conduct an R&D of an AI chatbot capable of addressing inquiries related to property management and community association management services.

This advanced chatbot would leverage the association’s documents, videos, and frequently asked questions (FAQs) to provide accurate responses. The desired outcome was to have the chatbot deliver precise and relevant answers to questions posed by employees. This would enable the company’s team members to access information efficiently, enhance customer service, and expedite their workflow.

However, there were certain challenges that we needed to overcome:

Dealing with various types of documents, including PDFs, docs, videos, and FAQs.
Ensuring the quality of documents and extracting the relevant information.
Building a chatbot capable of understanding custom questions and maintaining conversation context.

After gathering all the requirements from the client, we initiated our R&D process.

Initial solution concept

The initial concept revolved around leveraging the Documents Indexing Processor (GPT Index) to establish a connection with the GPT/LLM model.

Process:

Building the Knowledge Database:
• Configure the Documents Indexing Processor to establish a connection with the client’s data sources and ingest the documents.
• Implement the GPT model to collaborate with the Documents Indexing Processor.
Developing a Web Interface for the Chatbot:
• Create a user-friendly web interface for the chatbot that captures user details and comprehends user inquiries.
Providing Automated Updates:
• Offer a streamlined and automated method to update the AI model based on the client’s documents and instructions.

Building knowledge database

We embarked on the task of creating a knowledge database using the clients’ documents. This database plays a pivotal role in enabling the functionality of the chatbot.

Acquiring Documents

To gather all documents into one place, we had to either upload all the documents to our own database, by utilizing JSON file with all links (that we were provided with) or use the client’s server. We uploaded all documents to our server to have more control over the data.

Convert all information to the text and quality analyze

To convert all the information into text, we employed OCR technology.

Our chosen solution was the Tesseract OCR engine, a widely used open-source tool. By utilizing this technology, we could extract machine-readable text from the documents’ images or scanned pages. Additionally, Tesseract played a crucial role in evaluating the quality of each document. This assessment helped us identify any potential issues or errors introduced during the OCR process, such as incorrect character recognition or formatting discrepancies. Assessing the document quality was essential to ensure the reliability and accuracy of the knowledge database. Moreover, this allowed us to pinpoint areas for improvement in the overall process, focusing specifically on enhancing OCR accuracy and observing its impact on the final outcome.

This database, containing text-based information, serves as the foundation for the subsequent stages of our research.

Adding embeddings

In the field of natural language processing (NLP), embeddings serve as a technique to transform textual data into a numerical format that can be understood and processed by machine learning algorithms. For our knowledge database, we utilized OpenAI API to generate embeddings.

OpenAI’s text embeddings are designed to measure the relationship between different text strings. These embeddings find applications in various areas, including:

Search: Text strings can be ranked based on their relevance to a query, enabling effective search functionality.
Clustering: Text strings can be grouped together based on their similarity, facilitating clustering analysis.
Recommendations: By considering the relatedness of text strings, items with similar content can be recommended to users.
Anomaly Detection: Outliers that exhibit minimal relatedness to other text strings can be identified, aiding in anomaly detection tasks.
Diversity Measurement: Similarity distributions of text strings can be analyzed to assess the diversity within a dataset.
Classification: Text strings can be classified based on their similarity to predefined labels, enabling effective classification tasks.

Developing a Context-Aware Chatbot: Exploring Potential and Challenges with LlamaChain and Langchain

Custom data source was ready and now we had to enable connection between it and large language mode. To do this we have chosen Llamaindex.

LlamaIndex is a flexible and straightforward data framework that provides the key tools to augment your LLM applications with data, such as data ingestion, data indexing, query interface.

Initially, our approach involved utilizing the conversation context encapsulation capabilities offered by LlamaIndex. However, we encountered some challenges using LlamaIndex because its undergoing frequent updates. Additionally, the tools provided by Llamaindex for chat context handling essentially acted as a wrapper around another powerful technology known as Langchain. Therefore, to optimize our approach, we decided to delve deeper into the core technology, Langchain.

LangChain is an open-source development framework specifically designed for applications utilizing large language models (LLMs). It provides various components that serve as abstractions, enabling more efficient and programmatic utilization of LLMs.

These components include:

Models: Such as ChatGPT or other large language models.
Prompts: These encompass prompt templates and output parsers.
Indexes: Ingesting external data, including document loaders and vector stores.
Chains: Combining different components to create end-to-end use cases. For instance, a simple chain could involve Prompt + LLM + Output Parser.
Agents: Facilitating the utilization of external tools by LLMs.

By exploring the potential of LangChain, we aim to enhance the abilities of LLMs and develop chatbot solutions that excel at maintaining conversation context and delivering more advanced functionalities.

Indexes

Indexes in LangChain are used to structure documents for optimal interaction with large language models (LLMs). The indexing module provides utility functions and examples for working with different types of indexes.

The indexing components in LangChain include document loaders, text splitters, vector stores, and retrievers. Document loaders retrieve documents from various sources, text splitters break text into smaller chunks, vector stores are the main index type relying on embeddings, and retrievers fetch relevant documents for use with language models.

The most common use of indexes in LangChain is for retrieval, where relevant documents are fetched based on a user’s query. LangChain primarily supports vector databases as the main index type.

During testing, we experimented with different index types, including tree, vector, and graph indexes. However, the Vector database showed the most promising results, leading us to focus on its utilization.

By leveraging LangChain’s indexing capabilities, we optimized our chatbot’s performance, ensuring efficient retrieval and use of relevant information.

Below is some information regarding various index types and our conclusions after testing them.

1. Vector Index

The vector store index stores each Node and a corresponding embedding in a Vector Store.

Our conclusion

The Vector Index is primarily used for the retrieval of specific context. It is particularly effective when the query contains a piece of context that can be found in the document text. Personally, the Vector Index is considered the best index available. It outperforms other indices in retrieving information and works well with all types of questions except summarization. The most significant advantage of this index type is its rapid retrieval of required information.

We experimented with different vector databases, specifically the one that Llama uses by default and another one called Chroma. We aimed to identify which database offered the most efficient storage and retrieval of data.

2. Tree Index

The tree index builds a hierarchical tree from a set of Nodes (which become leaf nodes in this tree).

The Tree Index is best suited for summarizing documents or sets of documents. It can also be used for simple retrieval, but the Vector Index is more efficient for this purpose. The main drawback of the Tree Index is its relatively long response time, making it more suitable for summarization tasks.

3. List Index

The list index simply stores Nodes as a sequential chain.

Our conclusion

Similar to the Tree Index, the List Index is also best suited for summarizing documents or sets of documents. However, it differs in its underlying implementation. From a personal standpoint, the Tree Index outperforms the List Index in terms of speed and accuracy in finding the right information. The List Index was found to be slower and less efficient.

4. Graph Indices

Graph Indices are a type of index where we have a root index, such as a List Index, that contains other indices. This type of index is useful when we have a multitude of different indices and we want to combine them into one. All sub-indices should have a good description so that the appropriate index can be chosen for use. There are three types of Graph Indices:

a. List Index with Vector Indices
In theory, splitting one large index into smaller indices should reduce response time, but this was not observed with this type of graph.

b. Tree Index with Vector Indices
This is similar to the List Index with Vector Indices, but it operates slightly faster.

c. Simple Keyword Index with Vector Indices
This type of index works best for split indices. It significantly speeds up our response time compared to one large index. However, all sub-indices should have a very good description with a lot of keywords, as this type of graph will select the sub-index that has the most keywords in its description.

Vector stores

We used vector stores to store and search information via embeddings, we implemented on the previous step. VectorStore serves as a storage facility for these embeddings, allowing efficient search based on semantic similarity.

Chains

After confirming the effectiveness of our vector-based search in retrieving relevant documents, our next step was to input them into the LLM to generate more detailed and human-like responses.

To achieve this, we utilized the chain component of LangChain, which enabled us to create sequences of modular components tailored to specific use cases. LangChain’s index-related chains were particularly useful in interacting with indexes and combining our data with LLMs. An important use case we focused on was question answering using our own documents.

The objective was to integrate our indexed data with the LLM. LangChain provided support for four commonly used methods or chains:

Stuffing

The simplest approach involves including all relevant data as context in the prompt passed to the LLM. This is implemented as the StuffDocumentsChain in LangChain.

Pros: Only requires a single call to the LLM, allowing access to all the data at once.
Cons: Limited by the context length of the LLM, making it unsuitable for large documents or multiple documents exceeding the context length.

Map Reduce

This method entails running an initial prompt on each chunk of data and generating individual outputs. Subsequently, a separate prompt is used to combine these initial outputs.

Pros: Can handle larger documents and more documents compared to Stuff Documents Approach. Calls to the LLM for individual documents are independent and can be parallelized.
Cons: Requires multiple calls to the LLM, losing some information during the final combination.

Refine

The refine method involves running an initial prompt on the first chunk of data and generating output. This output, along with the subsequent document, is then passed to the LLM to refine the response based on the new information.

Pros: Allows for pulling in more relevant context and may retain more information compared to Map Reduce method.
Cons: Requires multiple calls to the LLM, and the calls are not independent, preventing parallelization. The order of the documents may also impact the results.

Map-Rerank

This approach entails running an initial prompt on each chunk of data, considering not only task completion but also assigning a score indicating the certainty of the answer. The responses are then ranked based on these scores, and the highest-scoring answer is returned.

Pros: Similar advantages as Map Reduce with fewer calls to the LLM.
Cons: Cannot combine information between documents, making it most suitable for scenarios with a single, straightforward answer within a single document.

We successfully utilized the Stuffing method to process and integrate our indexed data with LLMs, resulting in more comprehensive and contextually appropriate responses.

Additionally, we constructed our chain based on the RetrievalQA and Conversational Retrieval QA components offered by LangChain.

https://python.langchain.com/docs/modules/chains/popular/vector_db_qa

Agents

In certain applications, the required chain of calls to LLMs or other tools may not be predetermined but rather dependent on the user’s input. In such cases, an “agent” is employed, which has access to a range of tools. Based on the user’s input, the agent determines which tools, if any, should be called.

During our exploration, we examined various agent implementations and the tools provided by LangChain. Unfortunately, none proved to be a satisfactory fit for our specific requirements.

Nevertheless, we discovered that LangChain offers the flexibility to create our own agent and tool implementations. With this in mind, we embarked on devising an implementation tailored to our needs. However, we encountered an error within LangChain: our tool, built using an index generated by LlamaIndex, was not recognized as a valid tool. The underlying cause of this error remains elusive at this time.

We have shifted our strategy towards constructing a tool exclusively using LangChain’s tools, without relying on LlamaChain. We speculate that this approach may yield positive results, as LangChain’s index exhibits slight differences compared to that of LlamaChain. This divergence might potentially address the error we encountered in our previous attempts.

Prompt templates

A PromptValue represents the final value passed to the model. Typically, this value is not hardcoded but dynamically generated based on a combination of user input, non-static information from multiple sources, and a fixed template string. The component responsible for creating the PromptValue is called a PromptTemplate. It exposes a method that takes input variables and returns the corresponding PromptValue.

Prompt Experimentation

We conducted experiments with various prompts to optimize the chatbot’s responses. Our goal was to engineer prompts that would elicit the most relevant and useful responses from the chatbot.

During this testing phase, we reached the following conclusions:

LLM Testing

We evaluated different versions of LLMs, including text-davinci-003, gpt-3.5-turbo, and gpt-4. The text-davinci-003 model performed modestly, while the differences between gpt-3.5-turbo and gpt-4 results were not significant.

Input Parameter Testing

We explored various input parameters for the LLM, such as the number of input tokens and output tokens, to assess their impact on the chatbot’s performance.

Key Parameters:

Prompt: This refers to the input text that you want the AI to respond to. It can be a question, statement, or any other text you want the model to process.
Max tokens: This parameter sets the maximum length of the generated response, specifying the maximum number of tokens (text chunks) the model should produce.
Temperature: This parameter controls the level of randomness in the model’s output. Higher values (close to 1.0) make the output more diverse and creative, while lower values (close to 0.0) make it more deterministic and focused.
Top p or top_k: These parameters are used for techniques like nucleus sampling or top-k sampling, which probabilistically determine the next word in the sequence to add diversity to the model’s output.
Frequency Penalty: This parameter discourages the use of common phrases or responses. Higher values promote more original output, while lower values allow more common phrases.
Presence Penalty: This parameter encourages the model to introduce new topics in its output. Higher values lead to more diverse topics, while lower values keep the output focused on the input topic.

Compare/Contrast Queries Technique

This technique is used for decomposing complex questions into sub-questions so that it can collect all necessary data from our documents. Sometimes it decomposes well and logically and asks the right questions, but sometimes it does not. The main issue with this technique is the response time, which is excessively long even for small sets of data.

Enhancing AI Performance through Questions Decomposition and Context Retrieval Optimization

Our ongoing investigations have led us to hypothesize that some inaccurate responses from the OpenAI could potentially stem from sub-optimal context retrieval by the Langchain. In light of this, we have embarked on implementing and testing custom retrieval processing to verify and potentially rectify this issue.

In our quest for enhanced accuracy, we have initiated an innovative research project focused on generating synonymous questions to more accurately capture the appropriate context from the index. Intriguingly, our preliminary testing revealed a subjective accuracy score of 0.7. We observed that slight variations in question phrasing can lead to significant differences in the accuracy of the response.

For instance:

Question: “What is the due date for payments?” — produced no answer.
Question: “What are the due rules for payments?” — produced the required answer.

This pattern was also observed with a different set of questions:

Question: “For how long should confidential data be kept?” — produced the required answer.
Question: “For how long should confidential information be kept?” — produced no answer.

In light of these findings, we explored further by rephrasing questions that initially did not yield an answer. For example, we inputted the question, “For how long should confidential information be kept?” and obtained a list of synonymous questions that yielded appropriate responses. Some of these synonymous questions include:

“What is the required duration for retaining sensitive data?”
“Is there a specified time frame for maintaining private information?”
“What is the time period for the storage of classified information?”
“Does the document specify how long confidential details should be preserved?”
“For what duration is secret information intended to be kept?”
“How long does the document suggest to hold proprietary data?”
“Is there a recommended period for keeping confidential information secure?”

This research indicates that we may be able to engineer prompts to indicate whether a response is successful (true/false) and possibly even the source of the response (no answer/from context/from LLM knowledge). This functionality could allow us to filter responses and feed several questions until the required answer is achieved, potentially enhancing the performance and utility of our AI systems.

Developing a Web Interface for the Chatbot

Finally, we moved to the main part — how to show this to the customer.