How to Set Up and Run Ollama on a GPU-Powered VM (vast.ai)

4 min readJul 10, 2024

In this tutorial, we’ll walk you through the process of setting up and using Ollama for private model inference on a VM with GPU, either on your local machine or a rented VM from Vast.ai or Runpod.io. Ollama allows you to run models privately, ensuring data security and faster inference times thanks to the power of GPUs. By leveraging a GPU-powered VM, you can significantly improve the performance and efficiency of your model inference tasks.

Outline

Set up a VM with GPU on Vast.ai
Start Jupyter Terminal
Install Ollama
Run Ollama Serve
Test Ollama with a model
(Optional) using your own model

🐰 AI Rabbit: Tutorials, News, and Insights on More guides and AI developments at https://airabbit.blog/

Setting Up a VM with GPU on Vast.ai

1. Create a VM with GPU: — Visit Vast.ai to create your VM. — Choose a VM with at least 30 GB of storage to accommodate the models. This ensures you have enough space for installation and model storage. — Select a VM that costs less than $0.30 per hour to keep the setup cost-effective.

How to Set Up and Run Ollama on a GPU-Powered VM (vast.ai)

Outline

Setting Up a VM with GPU on Vast.ai

Written by AI Rabbit