How to Use GPT4All with Langchain to Chat with Your Documents

Vikas Tiwari
3 min readJun 7, 2023

--

Excited to share my latest article on leveraging the power of GPT4All and Langchain to enhance document-based conversations! In this post, I walk you through the steps to set up the environment and demonstrate how you can seamlessly chat with your own documents using advanced language models. Get ready to unlock new possibilities and streamline your document interactions. Let’s dive in!

Packages need to install

# Install langchain
pip install langchain

# Install vectorStore
pip install faiss-cpu

# Install gpt4all
pip install gpt4all

# Install huggingfaceHub
pip install huggingface-hub

# Install PyPdf for working with PDFs
pip install pypdf

After downloading the gpt4all model

Note: to download llm follow these links
Alpaca-native-7b

Import the necessary classes into your Python file.

from langchain.document_loaders import PyPDFLoader
from langchain import PromptTemplate, LLMChain
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import GPT4All
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores.faiss import FAISS
from langchain.callbacks.base import BaseCallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

Next, provide the path to your PDF files and split them into smaller chunks. Save these chunks for further processing.

documents = PyPDFLoader('path to your pdf').load_and_split()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024,chunk_overlap=64)
texts = text_splitter.split_documents(documents)
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
faiss_index = FAISS.from_documents(texts, embeddings)
faiss_index.save_local("path to folder where you want to store index")

After saving it you can comment above lines except this otherwise it’ll create indexes each time you run the script.

embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

define path for gpt4all model, and load indexes.

# load vector store
print("loading indexes")
faiss_index = FAISS.load_local("path to your index folder", embeddings)
print("index loaded")
gpt4all_path = 'path to your llm bin file'

perform a similarity search for question in the indexes to get the similar contents. You can update the second parameter here in the similarity_search method based on the pages from the index you want to do similarity search.

# # Set your query here manually
question = "your query"
matched_docs = faiss_index.similarity_search(question, 4)
context = ""
for doc in matched_docs:
context = context + doc.page_content + " \n\n "

After this create template and add the above context into that prompt.

template = """
Please use the following context to answer questions.
Context: {context}
- -
Question: {question}
Answer: Let's think step by step."""

define LLM, prompt and create a LLMChain

callback_manager = BaseCallbackManager([StreamingStdOutCallbackHandler()])
llm = GPT4All(model=gpt4all_path,n_ctx=1000, callback_manager=callback_manager, verbose=True,repeat_last_n=0)
prompt = PromptTemplate(template=template, input_variables=["context", "question"]).partial(context=context)
llm_chain = LLMChain(prompt=prompt, llm=llm)

The callback_manager parameter is optional. If you wish to monitor and track the different stages of your LLM (Language Model) execution, you can provide an appropriate callback handler or a list of callback handlers to this parameter. In this article, a specific callback called StreamingStdOutCallbackHandler is used to stream the response. However, you can also utilize other callback handlers such as logging or monitoring handlers to log or monitor the LLM's processes. Think of callbacks in a similar way to hooks used in frameworks like Angular or React, where different hooks serve different purposes, such as OnInit or OnChanges. They allow you to perform specific actions at various stages of the LLM's execution. If you're interested in learning more about callbacks, you can find additional information in the provided resource here.

To prevent multiple responses from being printed, you can utilize the repeat_last_n parameter. By setting it to 0, only a single response will be displayed in your console. This ensures a cleaner and more concise output.

The n_ctx (Token context window) in GPT4All refers to the maximum number of tokens that the model considers as context when generating text. It determines the size of the context window that the model uses to understand and generate coherent responses.

Now run the chain using .run() method

print(llm_chain.run(question))

--

--