Streamlit + Local LLM + PDFs

3 min readApr 22, 2024

Building off earlier outline, this TLDR’s loading PDFs into your (Python) Streamlit with local LLM (Ollama) setup. Another Github-Gist-like post with limited commentary.

Playing forward this Google-result and its code when searching “local llm pdfs”. My use case is to load all Apple iCloud iBooks into an “oracle”-GPT for private discussions. A sub curiosity is to have two GPTs responding as their author would (potentially across their multiple respective books). The first building block, covered here, is loading PDFs into a local LLM and confirming its PDF-trained results are more desirable (aka. spot-checked accurate) than the generic model.

Results

Personal test caveats

I’ll only load a single, random PDF from my iBook storage Reinventing Your Life by Jeffrey E. Young & Janet S. Klosko. On Apple Macs, these iCloud PDFs store under ~/Library/Mobile Documents/iCloud~com~apple~iBooks/Documents . My test runs from ~/Downloads and while I could easily reference the PDF from the iBooks folder instead of my test folder, that’s step two.
I know llama3 came out last week, but so far it hasn’t shown sufficient improvement for me to move off llama2-uncensored and accept the response censoring.

Comparing the generic LLM (🦙) to the PDF-trained LLM (📓), I was able to compare their results to various questions, e.g.

This image shows the generic LLM hallucinating but the PDF-trained LLM correctly identifying the book’s authors. 👏

Code

The following has no expectations/warranties, but it “works on my machine” (though as proof-of-concept, its code is ugly, I agree).

from langchain import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyMuPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import Ollama
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
import streamlit as st

llm = Ollama(model="llama2-uncensored")

@st.cache_resource
class PdfGpt():
    def __init__(self, file_path):
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
        chunks = text_splitter.split_documents(documents=PyMuPDFLoader(file_path=file_path).load())
        
        embedding_model = HuggingFaceEmbeddings(
            model_name="all-MiniLM-L6-v2",
            model_kwargs={'device':'cpu'},
            encode_kwargs = { 'normalize_embeddings': True }
        )
        vectorstore = FAISS.from_documents(chunks, embedding_model)
        vectorstore.save_local("vectorstore")
        
        template = """
        ### System:
        You are an respectful and honest assistant. You have to answer the user's questions using only the context \
        provided to you. If you don't know the answer, just say you don't know. Don't try to make up an answer.

        ### Context:
        {context}

        ### User:
        {question}

        ### Response:
        """

        self.hey = RetrievalQA.from_chain_type(
            llm=llm,
            retriever=vectorstore.as_retriever(),
            chain_type="stuff",
            return_source_documents=True, 
            chain_type_kwargs={'prompt': PromptTemplate.from_template(template) } 
        )

oracle = PdfGpt("reinventing_your_life.pdf") # PDF file name
ask = st.text_input("What's up?", key="ask", label_visibility='hidden')

A,B = st.columns([.05, .95])
C,D = st.columns([.05, .95])
with A:
    st.caption("🦙")
with C:
    st.caption("📓")

if ask not in [None, "", []]:  
    with B:
        st.markdown( llm.predict(ask) )
    with D:
        response = oracle.hey({'query': ask})
        st.markdown( response['result'] )

Say you call this file test.py , you’d run it (in a test where you’re okay with test data caching) after updating the PDF file name reference reinventing_your_life.pdf to your own test PDF, and then starting up Streamlit via streamlit run test.py .

👋

Streamlit + Local LLM + PDFs

Results

Code

Written by Stef Nestor