Streamlit + Local LLM + PDFs

Stef Nestor
3 min readApr 22, 2024

--

Building off earlier outline, this TLDR’s loading PDFs into your (Python) Streamlit with local LLM (Ollama) setup. Another Github-Gist-like post with limited commentary.

Playing forward this Google-result and its code when searching “local llm pdfs”. My use case is to load all Apple iCloud iBooks into an “oracle”-GPT for private discussions. A sub curiosity is to have two GPTs responding as their author would (potentially across their multiple respective books). The first building block, covered here, is loading PDFs into a local LLM and confirming its PDF-trained results are more desirable (aka. spot-checked accurate) than the generic model.

Results

Personal test caveats

  • I’ll only load a single, random PDF from my iBook storage Reinventing Your Life by Jeffrey E. Young & Janet S. Klosko. On Apple Macs, these iCloud PDFs store under ~/Library/Mobile Documents/iCloud~com~apple~iBooks/Documents . My test runs from ~/Downloads and while I could easily reference the PDF from the iBooks folder instead of my test folder, that’s step two.
  • I know llama3 came out last week, but so far it hasn’t shown sufficient improvement for me to move off llama2-uncensored and accept the response censoring.

Comparing the generic LLM (🦙) to the PDF-trained LLM (📓), I was able to compare their results to various questions, e.g.

This image shows the generic LLM hallucinating but the PDF-trained LLM correctly identifying the book’s authors. 👏

Code

The following has no expectations/warranties, but it “works on my machine” (though as proof-of-concept, its code is ugly, I agree).

from langchain import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyMuPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import Ollama
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
import streamlit as st

llm = Ollama(model="llama2-uncensored")

@st.cache_resource
class PdfGpt():
def __init__(self, file_path):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
chunks = text_splitter.split_documents(documents=PyMuPDFLoader(file_path=file_path).load())

embedding_model = HuggingFaceEmbeddings(
model_name="all-MiniLM-L6-v2",
model_kwargs={'device':'cpu'},
encode_kwargs = { 'normalize_embeddings': True }
)
vectorstore = FAISS.from_documents(chunks, embedding_model)
vectorstore.save_local("vectorstore")

template = """
### System:
You are an respectful and honest assistant. You have to answer the user's questions using only the context \
provided to you. If you don't know the answer, just say you don't know. Don't try to make up an answer.

### Context:
{context}

### User:
{question}

### Response:
"""

self.hey = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectorstore.as_retriever(),
chain_type="stuff",
return_source_documents=True,
chain_type_kwargs={'prompt': PromptTemplate.from_template(template) }
)

oracle = PdfGpt("reinventing_your_life.pdf") # PDF file name
ask = st.text_input("What's up?", key="ask", label_visibility='hidden')

A,B = st.columns([.05, .95])
C,D = st.columns([.05, .95])
with A:
st.caption("🦙")
with C:
st.caption("📓")

if ask not in [None, "", []]:
with B:
st.markdown( llm.predict(ask) )
with D:
response = oracle.hey({'query': ask})
st.markdown( response['result'] )

Say you call this file test.py , you’d run it (in a test where you’re okay with test data caching) after updating the PDF file name reference reinventing_your_life.pdf to your own test PDF, and then starting up Streamlit via streamlit run test.py .

👋

--

--

Stef Nestor

Data Security & Architecture, Theoretical & Geo Physics, Bayesian, hiking, hammocks, birdies, dino jokes.