Running LLMs locally with Ollama

Omar Alva
5 min readFeb 10, 2024

--

Cyber-llamas (AI generated by author)

Introduction

AI

Artificial Intelligence (AI) is a branch of computer science dedicated to creating systems capable of performing tasks that typically require human intelligence. These tasks include learning, reasoning, problem-solving, perception, and understanding language. The evolution of AI has been a journey of ambition, creativity, and innovation, deeply rooted in human history and philosophy. The concept of AI dates back to ancient civilizations, where philosophers like Aristotle and Plato explored the nature of human cognition and reasoning, laying the groundwork for symbolic AI, which represents human thought and reasoning using symbols.

LLMs

Large Language Models (LLMs) are a subset of AI focusing on understanding and generating human language. They are built on deep learning algorithms, particularly transformer models, and are trained on vast datasets, including texts from the web, books, and Wikipedia. This training allows LLMs to learn statistical relationships between words and phrases, enabling them to perform a wide range of language-related tasks such as translation, summarization, question-answering, and content creation.

Overview of Ollama

Ollama is a user-friendly interface for running large language models (LLMs) locally, specifically on MacOS and Linux, with Windows support on the horizon. It is a valuable tool for researchers, developers, and anyone who wants to experiment with language models. Ollama supports a wide range of models, including Lama 2, Lama 2 uncensored, and the newly released Mistal 7B, among others.

Features

  • Ease of Use: Ollama is easy to install and use, even for users with no prior experience with language models. It provides a simple API for creating, running, and managing models.
  • Versatility and Model Installation: Ollama supports a wide range of models, making it versatile for various applications. It also provides a straightforward installation process, making it appealing to individuals and small teams.
  • GPU Acceleration: Ollama leverages GPU acceleration, which can speed up model inference by up to 2x compared to CPU-only setups. This feature is particularly beneficial for tasks that require heavy computation. This feature is included out of the box and it requires zero intervention.
  • Integration Capabilities: Ollama is compatible with several platforms like Langchain, llama-index, and more.
  • Privacy and Cost: Running LLMs locally with Ollama ensures data privacy as your data is not sent to a third party. It also eliminates inference fees, which is important for token-intensive applications.

Installing ollama

Mac

To install Ollama on a Mac, you need to have macOS 11 Big Sur or later. The installation process can be done in a few steps:

  • Download Ollama: You can download Ollama for macOS from the official website.
  • Install with Homebrew: If you prefer using Homebrew, you can install Ollama with the following command:
brew install ollama

Linux

For Linux users, the installation process is straightforward:

  • Manual Installation: You can manually download the Ollama binary from the GitHub repository and follow the instructions provided there, which may include adding Ollama as a startup service and installing CUDA drivers if you’re using Nvidia GPUs.
  • Post-Installation Configuration: After installation, you may need to modify Ollama to start automatically. This involves creating a configuration file and setting the appropriate environment variables.
  • Install with a Single Command: Alternatively, you can install Ollama using a single command by running in your terminal:
curl https://ollama.ai/install.sh | sh

Before installing, it’s important to check if there’s a specific package for the Linux distribution you’re running, such as Ubuntu, Fedora, or Arch. This can ensure compatibility and smooth operation of Ollama on your system and automatic updates.

Basic Operations with Ollama

Find a model

To find a model in Ollama, you can visit the Ollama library page. This page lists all the available models that you can pull and run locally using Ollama. The models are listed by their capabilities, and each model’s page provides detailed information about the model, including its size, capabilities, and in most cases, a description how to use it.

Mistral library’s page (https://ollama.com/library/mistral)

Pull a Model

To pull a model using Ollama, you can use the pull command followed by the model name. For instance, to pull the latest version of the Mistral model, you would use the following command:

ollama pull mistral:latest

This command will download the specified model (in this case, the latest version of Mistral) to your local machine.

List Models

To view the models you have pulled to your local machine, you can use the list command:

ollama list

This command will display a list of all models that you have downloaded locally. It’s important to note that the ollama list command only shows models that you have already pulled to your machine, not all available models on the Ollama platform.

Run a Model

Once you’ve pulled a model, you can run it using the run command. For example, to run the Mistral model you just pulled, you would use:

ollama run mistral:latest

This command will start the model, and you can then interact with it through the Ollama CLI.

Example:

ollama run mistral:latest "What is the distance from Los Angeles 
to San Francisco by car?"
The driving distance between Los Angeles and San Francisco is
approximately 383 miles (616 kilometers) if you take the most
direct route via Interstate 5 North. The actual travel time
depends on traffic conditions, but it usually takes around
6 hours without stops. Please keep in mind that these estimates
are approximate, and for accurate information, consider checking
a reliable GPS or mapping tool.

Remove a Model

If you want to remove a model from your local machine, you can use the rm command followed by the model name. For instance, to delete the Mistral model, you would use:

ollama rm mistral:latest

This command will remove the specified model (in this case, the latest version of Mistral) from your local machine.

Conclusion

In conclusion, Ollama is an open-source platform that significantly simplifies the process of running Large Language Models (LLMs) locally, particularly on Linux and macOS systems. It stands out as a tool that democratizes access to advanced AI technologies, enabling researchers, developers, and enthusiasts to leverage the power of LLMs without relying on cloud services.

Ollama’s contribution to the AI field is significant, as it empowers users to maintain control over their AI models and data. It is one of the best Open Source AI tools for running LLMs locally, offering a blend of privacy, cost savings, and user-friendliness. As AI continues to evolve, tools like Ollama will play a crucial role in shaping how individuals and organizations interact with and benefit from AI technologies.

References

--

--

Omar Alva

Passionate about DevSecOps, automation, orchestration, cloud, security, AI, data science, and Python.