Private GPT: Fully Offline with Ollama
In this Article, we will walk through the Local gateway to Next generation AI and Privacy preserving. we see how to run the GPT in local — fully offline. We will use the tool called Ollama
What is OLLAMA?
Ollama is one of the platform for running the Language models on your local machine.
Key Features:
- Ollama Library — Access to variety of pre trained LLM’s
- Offline access and Privacy
- Easy installation and start using in few steps
- Easy use — Provides both CLI and API support
Advantages:
- Free and open source platform: Unlike other cloud paid subscription
- Custom Context — Chat with your own data (RAG)
- Data Privacy
- Offline Access
- Easy Customisation
RAG — Retrieval Augmented Generation
Main key feature with Generative AI is to make it adapt basd on our custom context. Ollama supports this feature. We can pass custom context easily. We use Embedding model to store the embeddings into VectorDB. Later based upon user input command, retrieval similar documents from vectorDB, and send to LLM as context.
Using GPT Before — With paid subscription service — OpenAI
The data is being sent to the service and get the response back. So, we cannot use this service with private/confidential code or financial data or other confidential data.
Example:
Using GTP Now— With OLLAMA
We have run the ollama server in local. We have downloaded a model called “llama3”. Now either using CLI/API, we can invole the model and get the result.
Using LLM via Ollama CLI
Getting started
Install
Visit https://ollama.com/download page and download the tool as per your operating system
Language Model : Download
- Visit https://ollama.com/library to list of language models.
- Download any model using the “ollama pull” command. Lets download the “llama3” language model
ollama pull llama3
This download will take few minutes as the size of library will be around 4GB
Embedding Model Download
- Visit https://ollama.com/library and download your favourite word embedding model. In this tutorial, lets use “nomic-embed-text”
- Download it using the same command as you downloaded LLM
List of downloaded models
ollama list
Run
ollama run llama3
Chat with PDF file data:
Step — 1: Load PDF file data
Load your pdf file, with which you want to chat. Ex: Rulebook, CodeNames, Article
from langchain.document_loaders import PyPDFLoader
loaders = [ PyPDFLoader('/Users/files/pdf/machinelearning-lecture01.pdf') #or any pdf document
]
docs=[]
for loader in loaders:
docs.extend(loader.load())
Step — 2: Document splitting and Embeddings.
Split the loaded pdf content/document into small chunks and perform embedding on those using any of your chosen model. Embeddings are the numerical representation in high-dimensional array. We will store this embeddings into the vectorDB.
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain.embeddings import OllamaEmbeddings
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=0)
splits = text_splitter.split_documents(docs)
print(len(splits))
OllamaEmb = OllamaEmbeddings(model="nomic-embed-text")
db = Chroma.from_documents(splits, OllamaEmb)
Now, our data is fed into vectorDB, and ready to be used whenever user require any information.
Step — 3: Retrieve relavent documents from vectorDB + get response from LLM
Now, lets assume, user is looking for some data from PDF document. He will send us the prompt.
We will embed the user prompt using the same above method, and search the relavent data from vector DB. This is “similarity” type search on vectorDB which the default type.
Once the relevant documents are obtained, pass those to the Language Model, where it will return the response needed for us.
from langchain_community.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import ConversationalRetrievalChain
retriever = db.as_retriever(search_kwargs={'k': 2})
#search_type='similarity' (default)
#search_kwargs->k=Amount of docs to return (Default: 4)
llm = Ollama(
model="llama3", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
)
# llm.invoke("The first man on the moon was ...")
qa_chain = ConversationalRetrievalChain.from_llm(llm, retriever, return_source_documents=True)
prompt = "What is machine learning ?"
result = qa_chain({'question': prompt, 'chat_history': []})
print(result)
Result
Machine learning is defined as "the ability that gives computers to learn — [inaudible] that gives computers the ability to learn without being explicitly programmed." It allows computers to learn from data and improve their performance on a task over time, without needing to be specifically programmed for each new situation.{'question': 'What is machine learning ?', 'chat_history': [], 'answer': 'Machine learning is defined as "the ability that gives computers to learn — [inaudible] that gives computers the ability to learn without being explicitly programmed." It allows computers to learn from data and improve their performance on a task over time, without needing to be specifically programmed for each new situation.', 'source_documents': [Document(page_content='So start by talking about what machine learni ng is. What is machine learning? Actually, \ncan you read the text out there? Raise your hand if the text on the small screens is legible. \nOh, okay, cool, mostly legible. Okay. So I\'ll just read it out. \nSo what is machine learning? Way back in about 1959, Arthur Samuel defined machine \nlearning informally as the [inaudible] that gives computers to learn — [inaudible] that \ngives computers the ability to learn without being explicitly programmed. So Arthur \nSamuel, so way back in the history of m achine learning, actually did something very \ncool, which was he wrote a checkers progr am, which would play games of checkers \nagainst itself. \nAnd so because a computer can play thousands of games against itself relatively quickly, \nArthur Samuel had his program play thousands of games against itself, and over time it \nwould start to learn to rec ognize patterns which led to wi ns and patterns which led to \nlosses. So over time it learned things like that , "Gee, if I get a lot of pieces taken by the \nopponent, then I\'m more likely to lose than win," or, "Gee, if I get my pieces into a \ncertain position, then I\'m especially li kely to win rather than lose." \nAnd so over time, Arthur Samuel had a check ers program that woul d actually learn to \nplay checkers by learning what are the sort of board positions that tend to be associated \nwith wins and what are the boa rd positions that tend to be associated with losses. And', metadata={'page': 10, 'source': '/Users/files/pdf/machinelearning-lecture01.pdf'}), Document(page_content="material that I'm teaching in the main lectur es. So machine learning is a huge field, and \nthere are a few extensions that we really want to teach but didn't have time in the main \nlectures for.", metadata={'page': 8, 'source': '/Users/files/pdf/machinelearning-lecture01.pdf'})]}
Interesting Solutions using Private GPT:
Once we have knowledge to setup private GPT, we can make great tools using it:
- Customised plugins for various applications. Ex: VSCode plugin
- Can develop your private GPT application with RAG
- No need of internet to use LLM. Download favorite LLM and use it as GPT
Thank You :)