Langchain: A Practical Guide to Implementing a Chat App with your documentation

Published in

Jarvislabs.ai

4 min readJun 20, 2023

The last few months have been crazy in the world of AI.

Applications like OpenAI ChatGPT, Google Bard, and Github Copilot are becoming integral parts of our day-to-day lives. These tools are powerful to answer many questions, but they fail to offer the right answers on topics they were not usually trained on. Let's look at a few examples

They cannot answer questions based on recent knowledge.
Ability to answer questions based on private data like company documentation, and code.
If you want to know about a company's latest financials.

In this blog let's build a simple chat application that can take a given documentation and answers question based on that. We will use tools like LangChain, OpenAI, and ChromaDB to do this.

You can find the notebook here.

Finetuning vs Prompting

There are 2 ways in which we can use custom data with LLMs.

The traditional approach is fintuning the LLM model which could be complex and expensive based on the use case.

Prompting has become popular in recent years which let us trick the LLM model to do tasks like classification or inject new knowledge into it.

Let's say we want to classify a movie review using LLM which was not trained for classification.

Let's also check what happens when we ask questions based on recent events like, you want to understand who is the protagonist in the movie Extraction 2.

As expected the AI/LLM will not be able to answer it. What if we give some context so that it can use that? So I instructed it to answer based on the knowledge I shared. A description from the IMDB website about the movie.

The LLM model can answer the question based on the new knowledge without ever training/fine-tuning on updated data.

Limitations of the above approach

In the above example, we passed on the additional knowledge manually. Let's look at how we can automate the process of

Searching for relevant information, like getting the piece from Imdb.
Adding the relevant information to the prompt, like the description of the movie

To do this, let's build a simple app that allows us to chat with the latest FastAPI docs.

FastAPI is a modern, fast (high-performance), web framework for building APIs

The high-level steps of building our app will look like

Data preparation
Creating a knowledge base
Use a popular LLM model to answer your questions based using the knowledge base

LangChain provides a lot of helper classes to handle the above, so we will leverage them to build our small AI app.

Data Preparation

I took a smile approach of extracting all the useful links from FastAPI sitemap.xml. Then use UnstructuredURLLoader from LangChain to download and extract the text content from these URLs.

# Download sitemap.xml file from a website and extract all the links
def get_links(url):
    url = f'{url}/sitemap.xml'
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'lxml')
        links = [link.text for link in soup.find_all('loc')]
        return links
    else:
        print(f'Error: {response.status_code}')
        return None

# Download the contents of a webpage given its urls
def download_pages(urls: list[str]):
    data = UnstructuredURLLoader(urls=urls).load()
    return data

Creating a Knowledge base using Chroma DB

Once we have the data, we need to store it in some form of database, so that we can retrieve the relevant text when required. Since it is all text data, SQL databases may not be the right choice for storage.

Vector DBs like Chroma DB and Pinecone are built exclusively for this. Let's understand at a high level what it does.

Convert all the text to smaller portions called chunks.
Convert these chunks to numerical representations called embeddings.
Given a search query, it will return the N(3) relevant pieces of text.

# Index and Store data to Chroma
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
split_docs = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(collection_name='webdata',
                           documents=split_docs,
                           embedding=embeddings,persist_directory='index_data')
db.persist()

# Load the stored embeddings
db = Chroma(collection_name='webdata',
            embedding_function=embeddings,
            persist_directory='index_data/')
query = "What is async code?"

# Fetch similar docs
docs = db.similarity_search(query,k=3)

LLMs answer using the knowledge base

We have prepared our knowledge base, based on the fastapi docs. We can pass any question to an LLM model, query the index from Chroma DB to add relevant docs to the prompt, and answer them using the below simple code.

llm = ChatOpenAI(model_name='gpt-3.5-turbo',temperature=0.0)
qa = RetrievalQA.from_chain_type(llm=llm, 
                                 retriever=db.as_retriever(),
                                 chain_type='stuff')
qa.run('How is async used in fastai')

You can replace FastAPI docs with your own docs.

Things we can try:

How to use tools like Gradio or Streamlit to build a simple chat application, and connect to our app.
Let's explore how to use open-source LLMs on your GPUs. It helps in handling confidential data, that you do not want to share with OpenAI or a third-party provider.

Useful resources:

Langchain is still new, the tool and its documentation are evolving fast. So the best place to check is their docs.

I also found this book very useful to get an introduction to LangChain and several of its key concepts.