Streamlit + Local LLM + PDFs
Building off earlier outline, this TLDR’s loading PDFs into your (Python) Streamlit with local LLM (Ollama) setup. Another Github-Gist-like post with limited commentary.
Playing forward this Google-result and its code when searching “local llm pdfs”. My use case is to load all Apple iCloud iBooks into an “oracle”-GPT for private discussions. A sub curiosity is to have two GPTs responding as their author would (potentially across their multiple respective books). The first building block, covered here, is loading PDFs into a local LLM and confirming its PDF-trained results are more desirable (aka. spot-checked accurate) than the generic model.
Results
Personal test caveats
- I’ll only load a single, random PDF from my iBook storage Reinventing Your Life by Jeffrey E. Young & Janet S. Klosko. On Apple Macs, these iCloud PDFs store under
~/Library/Mobile Documents/iCloud~com~apple~iBooks/Documents
. My test runs from~/Downloads
and while I could easily reference the PDF from the iBooks folder instead of my test folder, that’s step two. - I know
llama3
came out last week, but so far it hasn’t shown sufficient improvement for me to move offllama2-uncensored
and accept the response censoring.
Comparing the generic LLM (🦙) to the PDF-trained LLM (📓), I was able to compare their results to various questions, e.g.
This image shows the generic LLM hallucinating but the PDF-trained LLM correctly identifying the book’s authors. 👏
Code
The following has no expectations/warranties, but it “works on my machine” (though as proof-of-concept, its code is ugly, I agree).
from langchain import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyMuPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import Ollama
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
import streamlit as st
llm = Ollama(model="llama2-uncensored")
@st.cache_resource
class PdfGpt():
def __init__(self, file_path):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
chunks = text_splitter.split_documents(documents=PyMuPDFLoader(file_path=file_path).load())
embedding_model = HuggingFaceEmbeddings(
model_name="all-MiniLM-L6-v2",
model_kwargs={'device':'cpu'},
encode_kwargs = { 'normalize_embeddings': True }
)
vectorstore = FAISS.from_documents(chunks, embedding_model)
vectorstore.save_local("vectorstore")
template = """
### System:
You are an respectful and honest assistant. You have to answer the user's questions using only the context \
provided to you. If you don't know the answer, just say you don't know. Don't try to make up an answer.
### Context:
{context}
### User:
{question}
### Response:
"""
self.hey = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectorstore.as_retriever(),
chain_type="stuff",
return_source_documents=True,
chain_type_kwargs={'prompt': PromptTemplate.from_template(template) }
)
oracle = PdfGpt("reinventing_your_life.pdf") # PDF file name
ask = st.text_input("What's up?", key="ask", label_visibility='hidden')
A,B = st.columns([.05, .95])
C,D = st.columns([.05, .95])
with A:
st.caption("🦙")
with C:
st.caption("📓")
if ask not in [None, "", []]:
with B:
st.markdown( llm.predict(ask) )
with D:
response = oracle.hey({'query': ask})
st.markdown( response['result'] )
Say you call this file test.py
, you’d run it (in a test where you’re okay with test data caching) after updating the PDF file name reference reinventing_your_life.pdf
to your own test PDF, and then starting up Streamlit via streamlit run test.py
.
👋