Member-only story
Docker: Run AI Models Locally
Run AI Models Locally: Docker Desktop’s New AI Model Runner
I Got Early Access to Docker Desktop’s New AI Model Runner — Here’s What You Need to Know
❤️ Not a member? Click here to read this article for free.
Docker has just taken a major step forward with its newest addition to Docker Desktop: AI Model Runner. As a Docker captain I had early access to this feature and spent the last few days testing it out in real-world scenarios.
And let me tell you — this is a game-changer. 🚀
What Is the AI Model Runner?
The Docker Model Runner is a new experimental feature in Docker Desktop 4.40+ that gives you a Docker-native experience for running large language models (LLMs) locally. It’s available now on macOS with Apple Silicon (M1/M2/M3), with Windows support on NVIDIA GPUs coming in end of April 2025.
This is not another containerized runtime. Docker runs the inference engine (like llama.cpp
) directly on your host with GPU access, so you get:
- Direct GPU acceleration.
- No network latency
- Full control and privacy
Models are pulled as OCI artifacts from Docker Hub and dynamically loaded into…