Private GPT: Fully Offline with Ollama

lalith kumar
6 min readJul 14, 2024

--

In this Article, we will walk through the Local gateway to Next generation AI and Privacy preserving. we see how to run the GPT in local — fully offline. We will use the tool called Ollama

What is OLLAMA?

Ollama is one of the platform for running the Language models on your local machine.

Key Features:

  • Ollama Library — Access to variety of pre trained LLM’s
  • Offline access and Privacy
  • Easy installation and start using in few steps
  • Easy use — Provides both CLI and API support

Advantages:

  • Free and open source platform: Unlike other cloud paid subscription
  • Custom Context — Chat with your own data (RAG)
  • Data Privacy
  • Offline Access
  • Easy Customisation

RAG — Retrieval Augmented Generation

Main key feature with Generative AI is to make it adapt basd on our custom context. Ollama supports this feature. We can pass custom context easily. We use Embedding model to store the embeddings into VectorDB. Later based upon user input command, retrieval similar documents from vectorDB, and send to LLM as context.

Using GPT Before — With paid subscription service — OpenAI

The data is being sent to the service and get the response back. So, we cannot use this service with private/confidential code or financial data or other confidential data.
Example:

Using GTP Now— With OLLAMA

We have run the ollama server in local. We have downloaded a model called “llama3”. Now either using CLI/API, we can invole the model and get the result.

Using LLM via Ollama CLI

Getting started

Install

Visit https://ollama.com/download page and download the tool as per your operating system

Language Model : Download

  • Visit https://ollama.com/library to list of language models.
  • Download any model using the “ollama pull” command. Lets download the “llama3” language model
ollama pull llama3

This download will take few minutes as the size of library will be around 4GB

Embedding Model Download

  • Visit https://ollama.com/library and download your favourite word embedding model. In this tutorial, lets use “nomic-embed-text
  • Download it using the same command as you downloaded LLM

List of downloaded models

ollama list

Run

ollama run llama3

Chat with PDF file data:

Step — 1: Load PDF file data

Load your pdf file, with which you want to chat. Ex: Rulebook, CodeNames, Article

from langchain.document_loaders import PyPDFLoader

loaders = [ PyPDFLoader('/Users/files/pdf/machinelearning-lecture01.pdf') #or any pdf document
]
docs=[]
for loader in loaders:
docs.extend(loader.load())

Step — 2: Document splitting and Embeddings.

Split the loaded pdf content/document into small chunks and perform embedding on those using any of your chosen model. Embeddings are the numerical representation in high-dimensional array. We will store this embeddings into the vectorDB.

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain.embeddings import OllamaEmbeddings

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=0)
splits = text_splitter.split_documents(docs)
print(len(splits))
OllamaEmb = OllamaEmbeddings(model="nomic-embed-text")
db = Chroma.from_documents(splits, OllamaEmb)

Now, our data is fed into vectorDB, and ready to be used whenever user require any information.

Step — 3: Retrieve relavent documents from vectorDB + get response from LLM

Now, lets assume, user is looking for some data from PDF document. He will send us the prompt.

We will embed the user prompt using the same above method, and search the relavent data from vector DB. This is “similarity” type search on vectorDB which the default type.

Once the relevant documents are obtained, pass those to the Language Model, where it will return the response needed for us.

from langchain_community.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import ConversationalRetrievalChain

retriever = db.as_retriever(search_kwargs={'k': 2})
#search_type='similarity' (default)
#search_kwargs->k=Amount of docs to return (Default: 4)

llm = Ollama(
model="llama3", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
)
# llm.invoke("The first man on the moon was ...")

qa_chain = ConversationalRetrievalChain.from_llm(llm, retriever, return_source_documents=True)
prompt = "What is machine learning ?"
result = qa_chain({'question': prompt, 'chat_history': []})
print(result)

Result

Machine learning is defined as "the ability that gives computers to learn — [inaudible] that gives computers the ability to learn without being explicitly programmed." It allows computers to learn from data and improve their performance on a task over time, without needing to be specifically programmed for each new situation.{'question': 'What is machine learning ?', 'chat_history': [], 'answer': 'Machine learning is defined as "the ability that gives computers to learn — [inaudible] that gives computers the ability to learn without being explicitly programmed." It allows computers to learn from data and improve their performance on a task over time, without needing to be specifically programmed for each new situation.', 'source_documents': [Document(page_content='So start by talking about what machine learni ng is. What is machine learning? Actually, \ncan you read the text out there? Raise your hand if the text on the small screens is legible. \nOh, okay, cool, mostly legible. Okay. So I\'ll just read it out.  \nSo what is machine learning? Way back in  about 1959, Arthur Samuel defined machine \nlearning informally as the [inaudible] that gives computers to learn — [inaudible] that \ngives computers the ability to learn without  being explicitly programmed. So Arthur \nSamuel, so way back in the history of m achine learning, actually did something very \ncool, which was he wrote a checkers progr am, which would play games of checkers \nagainst itself.  \nAnd so because a computer can play thousands  of games against itself relatively quickly, \nArthur Samuel had his program play thousands  of games against itself, and over time it \nwould start to learn to rec ognize patterns which led to wi ns and patterns which led to \nlosses. So over time it learned things like that , "Gee, if I get a lot of pieces taken by the \nopponent, then I\'m more likely to lose than win," or, "Gee, if I get my pieces into a \ncertain position, then I\'m especially li kely to win rather than lose."  \nAnd so over time, Arthur Samuel had a check ers program that woul d actually learn to \nplay checkers by learning what are the sort of  board positions that tend to be associated \nwith wins and what are the boa rd positions that tend to be associated with losses. And', metadata={'page': 10, 'source': '/Users/files/pdf/machinelearning-lecture01.pdf'}), Document(page_content="material that I'm teaching in the main lectur es. So machine learning is a huge field, and \nthere are a few extensions that we really want  to teach but didn't have time in the main \nlectures for.", metadata={'page': 8, 'source': '/Users/files/pdf/machinelearning-lecture01.pdf'})]}

Interesting Solutions using Private GPT:

Once we have knowledge to setup private GPT, we can make great tools using it:

  • Customised plugins for various applications. Ex: VSCode plugin
  • Can develop your private GPT application with RAG
  • No need of internet to use LLM. Download favorite LLM and use it as GPT

Thank You :)

--

--

lalith kumar

Skilled Software Engineer in providing Business Solutions, Technology Enthusiast and Influencer