Leveraging LangChain, Pinecone, and LLMs for Document Question Answering: An Integrated Approach

Mayur Ghadge
6 min readJun 16, 2023


Document Question Answering (DQA) is a crucial task in Natural Language Processing(NLP), aiming to develop automated systems capable of understanding and extracting relevant information from textual documents to answer user queries. With recent advancements in Large Language Models (LLMs) like ChatGPT and innovative tools and technologies such as LangChain and Pinecone, a new integrated approach to DQA has emerged.

This integrated approach combines the power of LLMs for language understanding and generation, LangChain for document processing and indexing, and Pinecone for efficient vector storage and retrieval.

In this article, we explore the integration of these cutting-edge technologies and discuss how they collectively enhance the performance and scalability of Document Question Answering systems. We delve into the working principles of LangChain, Pinecone, and ChatGPT, and demonstrate their collaborative potential in solving DQA challenges. Furthermore, we highlight the benefits and implications of this integrated approach, paving the way for improved information retrieval and interactive document exploration.

Install required libraries:

# install required libraries
!pip install openai
!pip install langchain
!pip install --upgrade langchain openai -q
!pip install unstructured -q
!pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2" -q
!pip install poppler-utils -q
!pip install pinecone-client -q
!pip install tiktoken -q

Importing libraries to work with:

# importing libraries
import os
import openai
import pinecone
import langchain
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

Setting up credentials for OpenAI and Pinecone:

In order to access OpenAI’s language models, you will need an API key. If you don’t have one already, you can obtain it from the OpenAI website. To make use of Pinecone’s vector database, you must create an account on the Pinecone website at https://www.pinecone.io/. Once you have created an account, you will need to create a new index to get started. During the index creation process, make sure to specify the index name and dimensions.

It’s important to note that the dimensions of the index must match the embeddings generated by OpenAIEmbeddings.(refer below “Generating Embeddings with OpenAI’s Language Model (LLM) for Textual Data” section)

Additionally, you will need to set up the Pinecone API key, environment name, and index name in your code. Here’s an example of how to do it:

Screen shot of pinecone account while creating a new index.
Pinecone API key
# OpenAI credentials
os.environ['OPENAI_API_KEY'] = "openai_api_key"

# Pinecone credentials
api_key = "pinecone_api_key"
environment = "environment_name"
index_name = "index_name"

Loading documents from directory using directory loader:

Here is an convenient way to load documents from a directory using LangChain, a Python library for natural language processing tasks. This functionality can be particularly useful when working with large datasets or collections of text documents.

Here, we import the DirectoryLoader class from the LangChain library. This class allows us to load documents from a directory in an organized and efficient manner.

directory_path = 'dataset/'

def load_docs(directory_path):
loader = DirectoryLoader(directory_path)
documents = loader.load()
return documents

documents = load_docs(directory_path)
print("Total number of documents :",len(documents))

Document Splitting for Efficient Processing with LangChain:

The document splitting is performed using the RecursiveCharacterTextSplitter class from the LangChain library. By specifying the desired chunk_size and chunk_overlap, the text splitter intelligently divides the documents into smaller segments. This allows for parallel processing of the text data, enabling faster and more efficient analysis.

def split_docs(documents, chunk_size=500, chunk_overlap=20):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,
docs = text_splitter.split_documents(documents)
return docs

docs = split_docs(documents)

Generating Embeddings with OpenAI’s Language Model (LLM) for Textual Data:

The code snippet below demonstrates the process of generating embeddings using OpenAI’s Language Model (LLM) for textual data. By leveraging the power of the “ada” model, we can derive meaningful representations of text that capture semantic information and enable various downstream natural language processing (NLP) tasks.

embeddings = OpenAIEmbeddings(model_name="ada")

# Example
query_result = embeddings.embed_query("large language model")

Remember use the output value of “len(query_result)” will be the dimentions of the pinecone. In this example it is 1024.

Creating a Pinecone Index for efficient semantic search:

The process of creating a Pinecone index to enable efficient similarity search on text data. Pinecone is a powerful vector database that allows us to store and query high-dimensional embeddings with remarkable speed and accuracy.


index_name = "index_name"

index = Pinecone.from_documents(docs, embeddings, index_name=index_name)

Once the index is created, we can leverage Pinecone’s powerful search capabilities to find similar documents based on their embeddings. This enables tasks such as recommendation systems, content similarity analysis, and more.

Finding Similar Documents using Pinecone Index:

def get_similiar_docs(query, k=2, score=False):
if score:
similar_docs = index.similarity_search_with_score(query, k=k)
similar_docs = index.similarity_search(query, k=k)
return similar_docs

Above code allows us to retrieve similar documents based on a given query using the Pinecone index created earlier.

The function takes three parameters: query, k, and score.

The query parameter represents the input query for which we want to find similar documents. The k parameter specifies the number of similar documents to retrieve. By default, it is set to 2, but you can modify it according to your requirements. The score parameter is optional and defaults to False. If set to True, the function returns the similarity scores along with the documents.

Document-based Question Answering with GPT-3.5 Turbo and LangChain:

A function called “get_answer” that enables document-based question answering using OpenAI’s GPT-3.5 Turbo model and LangChain.

The function starts by initializing an instance of the OpenAI model using the OpenAI class. The model_name parameter is set to “gpt-3.5-turbo”, indicating the specific variant of the GPT model to be used.

Next, the LangChain is loaded with the initialized GPT-3.5 Turbo model. This chain, referred to as chain, is designed for question answering tasks using a document-based approach.

model_name = "gpt-3.5-turbo"
llm = OpenAI(model_name=model_name)

chain = load_qa_chain(llm, chain_type="stuff")

def get_answer(query):
similar_docs = get_similiar_docs(query)
answer = chain.run(input_documents=similar_docs, question=query)
return answer

Use case example for above code:

query = "What is large language model"
answer = get_answer(query)

query = "what are its application/usecases"
answer = get_answer(query)

we explored how to create a question answering system using large language models like GPT-3.5 Turbo. We used libraries like LangChain, Pinecone, and OpenAI to build the system.

First, we set up the necessary credentials for accessing the language model and indexing service. Then, we loaded and split documents into smaller parts to make them easier to process.

Next, we generated embeddings for the document parts, which capture the meaning of the text. We indexed these parts using Pinecone, which allows us to quickly find similar documents based on user queries.

To answer questions, we used the GPT-3.5 Turbo model and created a question-answering chain. This chain takes in the similar documents and a user’s question, and generates a relevant answer based on the provided context.

Finally, we demonstrated how the system works by asking it sample questions. The system leverages the context from the documents to provide accurate answers.

By combining these techniques and libraries, developers can create powerful question answering systems that can understand and answer questions based on a collection of documents.

I would like to acknowledge the valuable resource provided on the YouTube video titled “Understanding Question Answering with Large Language Models” by “Pradip Nichite” available at https://youtu.be/cVA1RPsGQcw . This video has been instrumental in enhancing our understanding of question answering systems and has contributed to the development of the code and concepts discussed in this blog post. We extend our gratitude to the creator for their insightful content.



Mayur Ghadge

AIML Engineer | Data Enthusiastic | Skilled in Data Analysis, Machine Learning, Deep Learning, and Natural Language Processing.