Running LLMs Locally: A Guide to Setting Up Ollama with Docker

3 min readJul 1, 2024

In this blog, we will delve into setting up and running a language model using Ollama locally with Docker. Ollama provides a robust platform for deploying and interacting with large language models (LLMs), making it an excellent tool for developers and researchers.

What is Ollama?

Ollama is a versatile tool designed for deploying and serving LLMs. It simplifies the process of setting up and managing models, allowing users to focus on leveraging the power of LLMs without the overhead of complex infrastructure management.

Why Use Ollama?

Ease of Setup: Ollama’s integration with Docker allows for quick and straightforward deployment.
Flexibility: Supports various LLMs, including popular models like Llama2 and Llama3.
Scalability: Can be configured to run on both CPU and GPU, catering to different performance needs.

Setting Up an LLM and Serving It Locally Using Ollama

Step 1: Download the Official Docker Image of Ollama

To get started, you need to download the official Docker image of Ollama.

For a CPU-only setup, use the following Bash command

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

For a GPU setup, use this Bash command:

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Step 2: Pick the Model

Ollama supports a wide variety of LLMs. For this example, we’ll use Llama2, but more capable models like Llama3 are also available.

Run the Ollama image and specify the model with the following Bash command:

docker exec -it ollama ollama run llama3

Now you are ready to use and prompt the model locally!

Step 3: Set Up an Ollama Class to Interact with the Model

To interact with the model locally, we’ll set up an Ollama class in Python. Here’s the source code implementation:

Here is a description of the Class Components:

Initialization (__init__): Sets up the model name, API endpoint, and default parameters for the model.
Prediction (predict): Sends a request to the model API and processes the streaming response.
Callable (__call__): Allows the instance to be called directly with a question, making it more user-friendly.

How to Prompt and Use the Ollama Model in the Docker Image?

You can directly prompt the model as follows:

llama3 = OLLAMA('llama3', temperature=0)
response = llama3('What is the capital of France?')
print("llama2 Response: ", response)

How to Use the Local Ollama Model with DSPy?

DSPy is a framework designed to optimize language model prompts and weights algorithmically, particularly useful when LMs are utilized multiple times within a pipeline. DSPy separates the flow of your program from the parameters of each step, introducing optimizers that can fine-tune prompts and weights based on desired metrics.

Lets have a look at how to use DSPy to interact with the Ollama model in the docker image:

from dspy import Predict, context
pred_qa = Predict('question -> answer')
with context(lm=llama3):
    resp = pred_qa(question='What is the capital of France?')
    print("llama2 Response: ", resp.answer)

By following these steps, you’ll be able to set up and run an Ollama model locally using Docker. This approach offers a flexible and scalable solution, empowering you to harness powerful language models in your applications.

Stay tuned for more blog posts where we’ll explore DSPy in greater depth and uncover its features. For more details on DSPy, visit their official documentation: https://dspy-docs.vercel.app/

You can access the full source code on my GitHub here: https://github.com/rawanalkurd