(Python) Streamlit + Local LLM

4 min readNov 30, 2023

Yet-Another-Code-Example for ChatGPT-like localhost LLM

👋 Howdy, y’all. I’m skipping most context/commentary and treating Medium like a Github Gist for this post.

Goal: My friends have been excited by ChatGPT and wanting to run offline, uncensored models but have been experiencing start-up frictions. I want to outline the 5min (including download time) way to get running and the 5min after to get a UI on top. AFAICT this write-up is unique to internet previous, but only someone’s better google-foo will tell.

1) Install Ollama

The last 9 months the internet has been figuring out the preferred way to run LLMs locally: Reddit, top 5 blog, LangChain. Dealers choice, but we’re just going to go Ollama to get llama2-uncensored (means it won’t say “I shouldn’t tell you that” — lol — and it will also emit the swear words nobody should say). So: Mac download link and then in Terminal initialize models

$ ollama run llama2 # default
$ ollama run llama2-uncensored # 👈 stef default
$ ollama list
NAME                     ID           SIZE   MODIFIED
llama2:latest            a808fc133004 3.8 GB 3 months ago
llama2-uncensored:latest 5823fb1154c5 3.8 GB 3 months ago

That’s it, that’s your command to run ChatGPT-like LLMs locally. (LLMs have various training data and therefore you’ll notice OpenAI’s is still currently shinier than what you can run locally, but let’s run both to vote for open source and open internet.)

2) Streamlit UI

Using Langchain, there’s two kinds of AI interfaces you could setup (doc, related: Streamlit Chatbot (tutorial) on top of your running Ollama. First install Python libraries:

$ pip install langchain duckduckgo-search streamlit

2A) Ask Local Only

For company-private data, you can setup a UI which only uses the local LLM …

import streamlit as st 
from langchain.llms import Ollama
llm = Ollama(model="llama2-uncensored:latest") # 👈 stef default

colA, colB = st.columns([.90, .10])
with colA:
    prompt = st.text_input("prompt", value="", key="prompt")
response = ""
with colB:
    st.markdown("")
    st.markdown("")
    if st.button("🙋‍♀️", key="button"):
        response = llm.predict(prompt)
st.markdown(response)

2B) Search the Internet and Answer

… But if you’re allowed to use your data/question’s context to search the internet, you can have your LLM Google/DuckDuckGo (example with DDG) …

import streamlit as st
from langchain.llms import Ollama
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks import StreamlitCallbackHandler
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import streamlit as st

llm = Ollama(
    model="llama2-uncensored:latest", 
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
)
tools = load_tools(["ddg-search"])
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, handle_parsing_errors=True
)

if prompt := st.chat_input():
    st.chat_message("user").write(prompt)
    with st.chat_message("assistant"):
        st_callback = StreamlitCallbackHandler(st.container())
        response = agent.run(prompt, callbacks=[st_callback])
        # BUG 2023Nov05 can spiral Q&A: https://github.com/langchain-ai/langchain/issues/12892
        # to get out, refresh browser page
        st.write(response)

2A+B) Combined

… And putting those together into just one UI (not pretty but done) …

import streamlit as st
from langchain.llms import Ollama
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks import StreamlitCallbackHandler
from langchain.callbacks.streaming_stdout_final_only import FinalStreamingStdOutCallbackHandler

search_internet = st.checkbox("check internet?", value=False, key="internet")
prompt = st.text_input("prompt", value="", key="prompt")

if prompt!="":
    response = ""
    if not search_internet:
        llm = Ollama(model="llama2-uncensored:latest") # 👈 stef default
        response = llm.predict(prompt)
    else:
        llm = Ollama(
            model="llama2-uncensored:latest", 
            callback_manager=CallbackManager([FinalStreamingStdOutCallbackHandler()])
        )
        agent = initialize_agent(
            load_tools(["ddg-search"])
            ,llm 
            ,agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION
            ,verbose=True
            ,handle_parsing_errors=True
        )
        response = agent.run(prompt, callbacks=[StreamlitCallbackHandler(st.container())])
        # BUG 2023Nov05 can spiral Q&A: https://github.com/langchain-ai/langchain/issues/12892
        # to get out, refresh browser page
        
    st.markdown(response)

Examples

To run these code snippets saved as home.py , in that folder’s Terminal run …

$ streamlit run home.py

… which will auto-open the browser UI for you. Now you’re ready to start googling Prompt Engineering to get answers formatted how you’d like …

Lastly, I’m unwilling to say better but that’s probably personality in play, but the above can be easily ported back into the Streamlit Chatbot type of fancy UI. I personally want customer data/email summation which doesn’t need this level of UI, but here’s the shiny: