Speak to your Data using LangChain and LLMs

10 min readJul 29, 2023

Introduction

Large language models (LLMs) are a type of artificial intelligence (AI) that are trained on massive datasets of text and code. They can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

LangChain is an open-source framework that makes it easy to build applications that use LLMs. It provides a suite of tools and components that simplify the development of LLM-centric applications.

Sidenote: The LangChain library is available in both Python and JavaScript, but we’ll be using the Python library for this tutorial.

Prerequisites

This article assumes that you have some basic knowledge of the following concepts in LangChain:

Models: A component that represents an LLM.
Prompts: A piece of text that is used to provide input to an LLM. They provide the LLM with additional information that it needs to complete a task.
Chains: A sequence of LLMs that work together to perform a task.
Embeddings : Embeddings are a type of vector representation of text that can be used to represent the meaning of words and phrases.
Vector Stores: Databases that store embeddings, and can be used to quickly retrieve embeddings for a given word or phrase.
Agents: A component that allows an LLM to interact with its environment.

What’s Next?

In this article, we will discuss how to use LangChain to talk to your data. We will start by loading data into a database, then creating a simple chain that uses an LLM to generate a text. We will then discuss how to use LangChain for question answering. Finally, we’ll leave you with a sneak peek of Langsmith, a new platform that can help you build more reliable and maintainable LLM applications.

I hope you enjoy the article!

Setup

Using LangChain will usually require integrations with one or more model providers, data stores, APIs, etc. For this example, we’ll use OpenAI’s model APIs.

First we’ll need to install the requirements:

python-dotenv==1.0.0
langchain==0.0.137
openai==0.27.8

Accessing the API requires an API key, which you can get by creating an account and heading here. Once we have a key we’ll want to set it as an environment variable by running:

export OPENAI_API_KEY="..."

If you’d prefer not to set an environment variable you can just load the environment variable using a .env file.

import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

How to Talk to Your Data with LangChain

To use LangChain, you first need to choose an LLM provider. There are a number of different LLM providers available, including OpenAI, Cohere, and Hugging Face.

from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature = 0.3)

Sidenote : In addition to the LLMs that are provided by LangChain you can use open-source models from the open LLM leaderboard by HuggingFace: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard. This is possible using the huggingface integration with LangChain: https://python.langchain.com/docs/modules/model_io/models/llms/integrations/huggingface_hub.
Some examples of LLMs that you can use with the huggingface integration include: Bard from Google AI, Llama from Cohere and Turing from Huggingface.

We can use other LLM models as well such as Llama 2:

You don’t need an API_TOKEN!

pip install llama-cpp-python

from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama/llama-2-7b-ggml/llama-2-7b-chat.ggmlv3.q4_0.bin",
    input={"temperature": 0.75, "max_length": 2000, "top_p": 1},
    callback_manager=callback_manager,
    verbose=True,
)

After choosing an LLM provider, you can create a LangChain application by creating a chain, which is a sequence of LLMs that work together to perform a task. For example, you could create a chain that first uses an LLM to generate a rough draft of a text. Then, you could use another LLM to edit the text and improve its quality.

Loaders are a type of component in LangChain that deal with the specific of accessing and converting data. They return a list of Document objects.

In this article, we will start by discussing loaders, which are components that can be used to load data from PDF, YouTube, URL, and Notion. Other sources can be loaded as well, but these are the most common.

PDF

Let’s load a PDF example from your device!

from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("docs/your_pdf_name.pdf")
pages = loader.load()

Each page is a Document.

A Document contains text (page_content) and metadata.

len(pages)
page = pages[0]
print(page.page_content[0:500])
page.metadata

YouTube

from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import OpenAIWhisperParser
from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader

url="https://www.youtube.com/watch?v=jGwO_UgTS7I"
save_dir="docs/youtube/"
loader = GenericLoader(
    YoutubeAudioLoader([url],save_dir),
    OpenAIWhisperParser()
)
docs = loader.load()
docs[0].page_content[0:500]

URLs

from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://github.com/basecamp/handbook/blob/master/37signals-is-you.md")

docs = loader.load()
print(docs[0].page_content[:500])

Notion

Follow steps here for an example Notion site:

Duplicate the page into your own Notion space and export as Markdown / CSV.
Unzip it and save it as a folder that contains the markdown file for the Notion page.

from langchain.document_loaders import NotionDirectoryLoader
loader = NotionDirectoryLoader("docs/Notion_DB")
docs = loader.load()

print(docs[0].page_content[0:200])
docs[0].metadata

Question Answering over Documents

In this section, we will learn how to use LangChain to build a QA system that can answer questions about a set of documents. Let’s start by importing the necessary components.

from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
from langchain.indexes import VectorstoreIndexCreator

DocArrayInMemorySearch is simple for a quick application since it’s an in-memory vectore store and doesn’t require connecting to an external database.

We load an example CSV data:

file = 'your_csv_data.csv'
loader = CSVLoader(file_path=file)

In this article we won’t be using chunks since we don’t have long documents, but feel free to use them for large documents.

index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

response = index.query(query)
display(Markdown(response))

The code does not use an LLM model because the query can be answered without using an LLM model. The vector store index is able to answer the query by simply comparing the query vector to the vectors of the documents in the index.

If you want to use an LLM model in the code, you can do so by replacing the VectorstoreIndexCreator class with the RetrievalQA class. The RetrievalQA class uses an LLM model to answer queries. We will deep into that later.

Let now do a step by step of the process:

loader = CSVLoader(file_path=file)
docs = loader.load()
docs[0]

from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
embed = embeddings.embed_query("Hi my name is Harrison")
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)
query = "Please suggest a shirt with sunblocking"
docs = db.similarity_search(query)

retriever = db.as_retriever()
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature = 0.3)
qdocs = "".join([docs[i].page_content for i in range(len(docs))])
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.")
display(Markdown(response))

The code loads a CSV file. It then uses the data to create a list of documents. Then embeds a query into a vector and creates a vector store index from the documents and the embeddings. After that, it uses the vector store index to search for documents that are similar to the query vector. Finally, it retrieves the top-k documents from the search results and uses an LLM model to generate a response.

All of theses steps can be encapsulated with the Langchain chain. So here we can create a RetrievalQA chain. This does retrieval and then does question answering over the retrieved documents.

qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."
response = qa_stuff.run(query)
display(Markdown(response))

So these two methods equate to the same result and that’s part of the interesting features of Langchain. We can do it in one line or we can look at individual things and break it down into five more detailed ones. The previous method let us control each step.

Additionally, we can return the source documents used to answer the question by specifying an optional parameter “return_source_documents” when constructing the chain.

qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever(), return_source_documents=True)

LangChain has three main types of chains for processing data: map reduce, refine, and map rerank.

Map Reduce: Processes data sequentially.
Refine: Processes data in parallel and improves output quality.
Map Rerank: Processes data in parallel and ranks output.

Map reduce is slower but more efficient in terms of memory usage. Refine is faster and produces higher-quality output. Map rerank is the fastest and can rank output.

image from https://github.com/ksm26/LangChain-Chat-with-Your-Data

qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)
result = qa_chain_mr({"query": question})
result["result"]

qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
result = qa_chain_mr({"query": question})
result["result"]

RetrievalQA limitations

RetrievalQA is a powerful technique for answering questions, but it has some limitations. One limitation is that it does not preserve conversational history. This means that if you ask a question about something that was discussed previously, the RetrievalQA model will not have access to that information.

This can be addressed by using the ConversationBufferMemory class of Langchain. This class stores the conversational history in a memory buffer, so that the RetrievalQA model can access it when answering questions.

Evaluating Your LLM Application

After creating your LangChain application, you need to evaluate it to ensure that it is working as expected and meets your requirements.

There are a number of ways to evaluate an LLM application. One common approach is to use the BLEU score metric, which measures the similarity between a generated text and a reference text. Another approach is to use a human evaluation, where a human assesses the quality of the generated text. A last approach is LLM assisted evaluation. This is a hybrid approach that combines the benefits of BLEU score and human evaluation.

In this article, we will discuss two methods for evaluating the performance of LLM-based applications: human(manual) evaluation and LLM assisted evaluation.

examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

Manual Evaluation

Manual evaluation of the LLM can be done by checking step by step the construction of the response.

from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

file = 'you_csv_file.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

import langchain
langchain.debug = True
qa.run(examples[0]["query"])

# ==> Manual evaluation of the LLM by checking step by step the construction of the response

The goal of the “langchain.debug=True” is to check step by step the construction of the response. For example, you can check the following:

The relevance of the retrieved documents: Are the documents that are retrieved relevant to the query?
The quality of the LLM’s summary: Does the LLM’s summary accurately summarize the information in the retrieved documents?
The overall coherence of the response: Is the response easy to understand and follow?

LLM assisted evaluation

LLM assisted evaluation is a technique for evaluating the performance of an LLM-based application by using an LLM to generate a human-readable summary of the output of the application. The summary is then evaluated by a human evaluator, who can provide feedback on the accuracy, completeness, and clarity of the summary.

The LLM assisted evaluation method can provide valuable feedback on the performance of LLM-based applications, but it can be time-consuming and subjective.

# Turn off the debug mode
langchain.debug = False

predictions = qa.apply(examples)

from langchain.evaluation.qa import QAEvalChain
llm = ChatOpenAI(temperature=0)
eval_chain = QAEvalChain.from_llm(llm)
graded_outputs = eval_chain.evaluate(examples, predictions)

for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['text'])
    print()

# Truncated output 
"""
Example 0:
Question: Do the Cozy Comfort Pullover Set have side pockets?
Real Answer: Yes
Predicted Answer: The Cozy Comfort Pullover Set, Stripe does have side pockets.
Predicted Grade: CORRECT

Example 1:
Question: What collection is the Ultra-Lofty 850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Predicted Grade: CORRECT
"""

Taking LLMs to Production with LangSmith

LangChain was created to reduce the barrier to entry for building LLM prototypes. It has largely achieved this goal, but the next challenge is to get these prototypes into production.

LangSmith is a unified platform for debugging, testing, evaluating, and monitoring your LLM applications. It provides a number of features that can help you improve the quality of your applications, including:

A debugger that allows you to step through your applications and see how they are working
A test suite that can help you to test the functionality of your applications
An evaluation tool that can help you to measure the quality of your applications
A monitoring tool that can help you to track the performance of your applications

Langchain = prototyping
LangSmith = production

Final Thoughts

LangChain is a powerful framework that can help you build applications that talk to your data. It is easy to use and provides a number of features that can help you improve the quality of your applications.

LangSmith is a new platform that can help you take your LLM applications to the next level. It provides a number of features that can help you debug, test, evaluate, and monitor your applications.

I hope this article has helped you learn more about LangChain, LangSmith and LLMs. If you are interested in learning more, I recommend visiting the LangChain website and the LangSmith website.

Stay updated with the latest news and updates in the creative AI space — Follow me on Medium and LinkedIn

On Medium, I write about GenAI, MultiModal, NLP and Marketing Optimisation. You can also find me on LinkedIn, where I connect with other creative AI enthusiasts and professionals.

Have a nice day!