How to Set Up and Run Ollama on a GPU-Powered VM (vast.ai)

AI Rabbit
4 min readJul 10, 2024

In this tutorial, we’ll walk you through the process of setting up and using Ollama for private model inference on a VM with GPU, either on your local machine or a rented VM from Vast.ai or Runpod.io. Ollama allows you to run models privately, ensuring data security and faster inference times thanks to the power of GPUs. By leveraging a GPU-powered VM, you can significantly improve the performance and efficiency of your model inference tasks.

Outline

  1. Set up a VM with GPU on Vast.ai
  2. Start Jupyter Terminal
  3. Install Ollama
  4. Run Ollama Serve
  5. Test Ollama with a model
  6. (Optional) using your own model

🐰 AI Rabbit: Tutorials, News, and Insights on More guides and AI developments at https://airabbit.blog/

Setting Up a VM with GPU on Vast.ai

1. Create a VM with GPU: — Visit Vast.ai to create your VM. — Choose a VM with at least 30 GB of storage to accommodate the models. This ensures you have enough space for installation and model storage. — Select a VM that costs less than $0.30 per hour to keep the setup cost-effective.

--

--