Docker for Mac performance: DIY D4M

MrManafon
Homullus
Published in
10 min readJul 30, 2021

--

Lets spin up a Linux VM, use it as a hypervisor to run our code and Docker containers in it. Then, lets automate it for a seamless development experience for some awesome performance gains.

Important: This is Part 2/3. Today, we move our Docker into a VM in order to get an order of magnitude performance boost.

Part#1 Mitigation strategies for common performance issues on D4M.
Part#2 Replace xhyve and build our own hypervisor with better performance.
Part#3 Go container-first, switch to development in a VPS.

Wow boy, those posts exploded and became №1 search results on Google.

In the previous post about poor performance for Docker on macOS, about a year ago I have invested a stupendous amount of time trying out very unpractical solutions to beat it’s bad performance on macOS.

I was wrong all along, it was a silly idea to try to beat D4M. Instead, I should have just rolled my own from the start.

This is a long and fun post about the discovery process. If you are impatient, jump straight into the GitHub Repository!

Take a look at the Table of Contents and pick your fancy:

Disclaimer: Image shamelessly copied from Docker Forums without any permission.

Quick recap of how we got here

I was leading a fairly large team on a fairly large project in a fairly large company. We needed Docker. We all had Macs. Any solution had to be 1 — click & dummy — proof.

  • Docker 4 Mac has horrible performance on some pretty common workloads. (think: Ruby, NodeJS, PHP…)
  • The performance bottleneck is due to its shared filesystem. The more you “spin the disk” CPU load rises, which causes a runaway effect.
  • It is common to see 400% cpu load.
  • In the previous post we’ve tried various coping strategies.
  • At the time mutagen was simply the best overall choice.
  • About the same time D4M team incorporated Mutagen. But the implementation left a lot to be wished for.
  • Along with that, we have explored various caching strategies that you can use for your app, to make the pain tolerable. K8s or databases were still a major pain, code editors were lacking in advanced intellisense, but overall it was better than pure Docker for Mac.

Here’s some further reading on what exactly happens (not why) by
Lee Hambley. As he explains, these particular “problematic” workloads seem to be on a heavy side with their filesystem operation counts.

Trivia: Building caches in Symfony can cause hundreds of thousands of filesystem operations as it checks for circular references.

“Ruby apps also make a mind bogglingly high number of calls to stat and lstat, I'm quite sure that's because of bootsnap (tries to cache byte code to start faster), those syscalls in particular are pretty slow on osxfuse, compared to being 70 microseconds or less on native OS.”

A bit on shared filesystems

Sharing filesystems is tricky as you saw from previous post. If you don’t trust me, just take a look at numerous 600+ comment Issues on GitHub. Much smarter minds than mine work at Docker, and they attempted at least 4 solutions that I am aware of, two of which included writing closed source filesystems. No luck. Their best effort so far is the new gRPC-Fuse filesystem. Unfortunately, the benchmarks are not so fun.

Various filesystems are optimized for different workloads.

You might be wondering how did Mutagen work se well compared to osxfs? It essentially skips the whole filesystem sharing shebang and uses rsync to externally sync all the files. Includes fancy stuff like ignore syntax and fsevents.

Chart by Lee Hambley

A bit on how Docker for Mac works

D4M implements a Docker context (more on that later) which talks to a Linux VM (xhyve hypervisor) that runs on your system. Exactly as it sounds, a real VM.

The VM is “sharing” your whole user folder. When you issue a command locally, it is actually executed inside the VM, but since all the folders have exactly the same paths, you never notice.

A bit on docker contexts and docker-machine

You can create a context. A context is just a pointer towards a local or remote docker installation.

Docker-machine is a utility that does a pretty similar thing, based on available environment variables.

The research bit: Looking into the past

I love it when we work so hard in our tunnel-vision on finding a solution to a problem, just to discover that there was a guy in MIT in 70s that wrote his PhD on the topic. In software engineering that happens too often.

Parallels sharing filesystem

When you actually sit down and read through aforementioned GitHub Issues for docker-for-mac (I did, all 600) from time to time you find comments briefly mentioning Parallels. They usually don’t go into any depth other than “it works much better” and are quickly disregarded by the Docker team.

At this point I’m feeling somewhere in between “Pepe Silvia whiteboard meme” and Nick Cave sitting in his old bedroom in Berlin. Too many open tabs and no “answers from above”.

So i sat down, set up a simple Ubuntu Server VM, installed Docker, shared my projects folder, spun up the fattest project i have. Oh boy was it better! I performend a `composer install` in a project that drains 6GB of ram on D4M, and my CPU didn’t go above 50%. In a fully shared folder mind you, no caches and tricks.

I am not the first guy to notice this, as this very popular project shows. But the project is based on a very bad and unmaintained distro that i have burned myself before, so we’ll skip that for today. But it is always good to see how others solved such an issue.

Then I got interested into WHY Parallels is so much better. And to be frank there are no real explanations and it’s no wonder. This is a niche problem, with a niche product, and the solution is a closed-source commercial application.

The set of features they sacrificed (and the speed they got) points towards NFS, but still they manage to simulate permissions and even fsevents unlike NFS. Some proprietary wrapper around it perhaps.

Gonzo narrative: Essentially someone in Parallels said, “the native stuff performs horribly, we’ve got money, let’s roll our own and make more money”.

The reason why the Docker team decided not to take the same route, is (their own words) that they did not wish to sacrifice certain POSIX guarantees that Parallels didnt have to cater for, for speed, as they firmly beleive that the current state of D4M is “good enough for most users”.

The Docker team seems to hold onto “the right way” of building a hypervisor VM. Parallels actually has all the know how on how to optimize common workflows && the OS itself and is not afraid to play dirty.

I hoped to learn more so I tried my luck on a couple of platforms and an user on Elixir Slack reached out. He didn’t really have an answer but he did have a better explanation than most, about what seems to be happening:

Proprietary driver is my take.

Linux-side a fs driver that probably communicates through shared memory or a similar very fast channel to the VM hypervisor which translates it into MacOS fs calls and back.
Skipping all the emulation layers and having a very fast channel between the fs driver and MacOS mediated by the hypervisor is pretty much a precondition for the speed that Parallels manages to achieve.

Pro Tip: lsmod to see what drivers are loaded, for starters.

~ Cees de Groot
Principal Software Engineer
@ Canary Monitoring Inc.

An additional boon is the fact that folks at Parallels do this as their job, they seem pretty good at managing VMs. so the VM behaves much better with power management and throttles/floors the CPU when the VM is idle. My mac can breathe again, long gone are the days of sweaty palms whenever i touch the keyboard.

Vagrant environments

There is this little company named Hashicorp that had worked with this cloud stuff for quite a while now. 12 Factor architecture, Terraform, Consul came out of there and so did Vagrant.

It is simply a standardized wrapper around local virtual machines that allows us to treat them in an “Infrastructure as Code” native way. Write down a little file, and provision a VM with a single command, in a repeatable way.

Putting it all together

Open up the repository and read the readme. I did my best to keep it short.

What we do is use Vagrant to create a cute little VM. Use contexts and hosts to replicate the “local docker” behaviour in a similar (albeit more primitive) way like Docker for Mac.

Each time you want to use Docker, you no longer click the Docker icon. Instead you do vagrant up like the repo says, and when it is up, you go on about your business like you normally would.

You will need a Parallels for this. This does not work with VMWare nor with Vbox, they suffer from the same issues like D4M.

The part i may be the most proud of, is that we can now safely remove almost all of our caching mechanisms, container volumes, ignores and similar dirty code from the infrastructure, and get back on track of having dev/prod parity.

Always use the right tool for the job.

Pretty soon after going in you will want to start removing your local interpreters. Various ASDF or brew installations of PHP, Elixir, Python.

When you do, your IDE/editor will cry about it as it no longer has access to the language interpreter.

How to use remote interpreters: Visual Studio Code

VSCode comes with a capability to connect to remote servers via SSH. This also works for containers (remote or not) as long as your docker context points to a Docker engine.

You select a container you’d like, and a new window pops up. Quite literally from inside the container! It will be almost completely isolated from the outside world and will offer you an unique look into what exactly your app would see around it. Best of all — everything happens in the container, so you are using the exact same interpreter as the app and have access to the same caches, deps, meta or build files.

P.S. there may or may not be a bug with VSCode wherein it sometimes cannot connect. It’s weird, and can be resolved with enforcing the following setting in VSCode "docker.host": "ssh://vagrant@workbox"

How to use remote interpreters: JetBrains IDEs

JetBrains has a slightly different way of doing this (unless we count in the Projector). You run your app on your computer, but choose to use the “remote interpreter” instead of a local one.

AFAIK all the Rich JetBrains editors have this functionality. I have only used IntelliJ Ultimate, PyCharm and PHPStorm so far.

You can also spawn a Projector container within the VM, and simply use it that way. Whatever floats your boat!

Future prospects: Container-first development

This concludes our second in a series of three articles. The next one will be along shortly and it will teach you how to take a step further, into container-first development.

I’ve been at it for the past couple of months on over 30 projects. My battery lasts for 8 hours a day and my CPU never goes above 30%. I even used an iPad to develop once. I plan on swapping my BTO 16 inch MacBook pro for an M1 machine soon.

Announcement: This is Part#2 of a three part series. In Part#3 we take it a step further, and learn how to work in a container-first environment. Wherein we get full native linux performance, as well as 17 hour battery and zero fan spins.

Part#1 — Overcome performance issues with D4M.
Part#2 — Replace xhyve and build our own hypervisor with better performance.
Part#3 — Employ a container-first development environment.

Vi ses.

Update 16.11.2021: Docker on M1 Max CPU — UTM as a viable option.

I’ve completely switched to working in a VPS (read Part#3 here) and won’t be going back.

However, some use cases still require a mac. M1 Macs seem to have double the D4M performance. The performance is definitely better than Intel Mac or WSL2, and still significantly worse than linux.

Additionally, people on M CPUs have been using an UTM VM to do development in. They say that it is very well optimized for M processors. Looking at the numbers below, my best guess is that the person is doing C compiling.

NOT MY BENCHMARKS. Copied from MacRumors forums:

macOS Intel (D4M) — 460s
macOS M1 Max (D4M) — 220s
Dell XPS 17 (Pop!_OS / WSL2) 70sec
macOS M1 Max (UTM Debian) — 9s

--

--