Beating some performance into Docker for Mac

MrManafon
Homullus
Published in
15 min readJul 4, 2020

--

Using Docker on a Mac in 2020 is comparable to having a 2009 Peugeot 1007 in 2020. I would know for I have both. Over the past year, this story blew up as a first result on Google and I can completely understand why.

Different workloads suffer from different bottlenecks when it comes to Docker for Mac. Most common one by far being the filesystem operations speed. Luckily for us, there are some mitigation strategies that might help ease the pain.

Important: This is Part 1/3. Today, we go over the most common mitigation strategies and a bit of history around what Docker team tried to fix it.

Part#1 Mitigation strategies for common performance issues on D4M.
Part#2 Replace xhyve and build our own hypervisor with better performance.
Part#3 Go container-first, switch to development in a VPS for native Linux performance.

Wow boy, those posts exploded and became №1 search results on Google.

Seriously, I love my Mac, I’ve used OS X as my main OS for like 16 years at this point. But one thing I learned from my clients (Movie Studios) back when I had an ACMT shop is that you always should use the right tool of the job (Linux in our case), and not lose valuable time with improvisations. Unfortunately for me, the company is large and we aren’t in a position to change the setup, and unfortunately of all of us, Docker on Mac is a story of improvisations.

TL;DR: D4M is hella slow because of shared volumes. If we completely remove use of shared volumes and use rsync instead, it works much faster.

A word of warning: Needlessly long article ahead. I shared the process of discovery and reasoning. If you are here for a quick-fix, just go to the provided Github Repo and check out the bottom of this article for a solution (Step 5).

A bit of self promo: If you are working with PHP, and getting into Docker and containers, be sure to first read my other post about that specific topic. (incoming next week).

Peugeot 1007

Peugeot 1007 — Its 12 years old, works (generally speaking), its clunky and has suspension problems like once a month also the super fancy doors sometimes close on you.

I’ve had to change the steering rack, but once I did that, the steering wheel angle sensor went wild, so I had to go to another mechanic to regret the trap, after which the steering wheel was sideways so I had to go to another guy, to recalibrate the sensor. It worked now and I was so happy.

Bear in mind that the car is a bit rare, so most mechanics see it for the first time and get confused or just tell me that I should change my car because “this one has weird doors”.

Two days after that, my car stopped in the middle of the roundabout, and wouldn’t change gears anymore. The gearbox seems to have overheated. It happened two more times before I went to the mechanic, who told me something along the lines of “Lol guy, gearboxes can’t overheat”. After that it worked for at least 500 miles, before it randomly happened again. I went to another mechanic, who told me that the last guy was a nincompoop, and that my specific gearbox was a 2-Tronic model with a ECU unit which may overheat under certain conditions. So I had to replace that to. It worked now and I was so happy. A week after that, the front passenger window motor self-immolated. Im not even kidding.

You get the point.

(BTW, if you came here looking for 1007 advice, its a feisty little car, go to this FB page instead)

Step 1: Docker for Mac

My story with Docker for Mac is pretty similar. Theres a lot of history, thus a bunch of outdated and wrong advice. It works, kind of, but works poorly and with many workarounds. Historically speaking, it’s better than ever, but compared to the Linux version, it is still alpha in 2019.

In photo, Nick Currie aka Momus covering his eye with an eyepatch — Most of your code is inherited from the build phase, but you just cover some parts of it via a volume.

How Volume sharing should work in a perfect world

In theory, you can create a volume, bind a local directory from the host machine to it, and mount it to any number of your containers. The data will be shared instantly. Simple concept. Now, in production you will seldom do this. Usually you will copy your code from host or git into the container, during build phase and run the program at native IO speeds.

On the other hand, what you want to do on dev is to use all the same Dockerfiles for building, copying etc. but in the end, on boot, replace the copied code with your own, shared volumes/folders. This will allow you to use the exact same config as prod, but iterate way faster as any of your changes would be instantly propagated to the container and ran from there.

You can mount a volume like this — Left side represents the xHyve path from the chosen container context (usually the same as your host) and right side represents the container location.

barman:
volumes:
- /Users/markomitranic/Sites/barman/src:/app/src

You can also define a reusable, named volume like this:

nginx:
volumes:
- myapp:/app/src
barman-fpm:
volumes:
- myapp:/app/src
volumes:
myapp:
driver: local
driver_opts:
type: none
device: /Users/markomitranic/Sites/barman/src
o: bind

That Speed tho

Well, on Linux machines this performs awesomely. The changes propagation is near instantaneous, the read/write speeds are almost native, permissions are inherited by a weird but functioning mechanism and all your favourite build/watch tools work just fine. The volumes are properly reused on restart.

On Docker for Mac, everything works the same, but the performance is awful. look here and here. Having and sharing a small Python codebase or a static site is great — but boy does using Symfony3 suck. Storing a MySQL or Mongo database (in my case both at the same time) or something more comprehensive like Symfony or Wordpress will slow things down immeasurably. CPU will start spiking into high 100%s and mid 200%s (even while your app idles), cache warmup will last for 5–10 minutes and page loads will last for minutes at a time depending on your application.

Keep in mind that if you are developing a fully 12-factor-compliant microservice, be it Symfony or anything else, your codebase should be performant on D4M as it is already properly chunked and easy to containerise. When talking about old-school monoliths, distributed monoliths and bigBallsOfMud or Wordpress, all of which are hard to split up, you will experience difficulties. This is what this post is about.

Caching methods

Docker for Mac has been out of beta for quite a while now, and it had this problem since alpha. Essentially it boils down to inconsistencies in the kernel virtualisation support as well as us using APFS and the docker virtual machine using Ext4, virtualising the propagation of inotify events or ownership on hundreds of thousands of files must be tough. Well, the reason my answer is so vague is that I haven’t really been able to find a technical explanation. It seems to me that nobody really knows the exact source of the issue on Apple’s part, apart from guessing like I did above. Essentially what I saw was that Docker for Mac was using an osxfs volume share between the host and xHyve (it’s virtual machine) which delegates volumes further using Docker’s native capabilities.

You know those answers which don’t point you to the root of the problem but tell you to just trust them and paste something in your Terminal.

Anyways, since the issue has persisted for so long, the Docker for Mac team implemented three sets of flags to docker-compose’s volume mount. These flags supposedly control the share caching mechanism, and make IO speeds 3x better if used properly:

  1. consistent — This is the default behaviour.
  2. cached — Updates may have a lag before showing up on the host side.
  3. delegated — Updates may have a lag before showing up on the container side.

One thing I would like to note here is that no one actually ever explains what happens the other way around. If I select cached, is my container read speed now faster? What about sync, does it mean that files are immediately sync’d from host to the container? What about write speeds on the container? How am I to make an informed decision if there is no information?

Well, that is all great, and it made the whole thing perform better, but we are yet far far from the solution. I considered ratcheting the problem. I have verified that the read/write speeds are 3x faster than original in a synthetic test with writing and reading 10k files, regardless of the combination of delegated/cached I set. Synthetic tests are a bad idea since they sacrifice the hen using a real-world application, load times for Wordpress are around 15s++ and for our Symfony app we could expect no less than 45s. Composer install lasted for 6 minutes. All in all, this is unusable in development where you want to be able to iterate in seconds. Especially for our front-end developers.

Docker for Mac is a terrible solution with a horrendeous DevEx.

Excluding stuff

The next logical step was to make the amount of files we are handling with sync smaller. Do we really need those node_modules, Composer vendors or our own built minified assets? If Symfony cache clear lasts for 5 minutes, imagine what node_modules does to our IO.

In a perfect world, we could just ignore certain directories via an ignore file, they wouldn’t be synced anymore and that’s that.

Since we are not living in a perfect world, this looks dirty, but at least its simple and cross-platform:

- my-code:/app/myapp:delegated # this is sync, yay
- /app/myapp/var/cache # Ignore sync on cache dir

Verdict: Docker for Mac is a terrible solution with a horrendeous DevEx because of its poor performance.

Step 2: NFS

Huh, so that failed. Ok, at this point I figured that instead of taking a step further, I should take a step back and look at the (in theory) simpler methods like network shares.

NFS sounds like a very good idea from the start tbh, it is fast, battle-tested, pretty reliable, and based on my tests doesn’t cause a CPU bottleneck.

I spun up the xHyve shell (command below) and verified that the NFS share was mounted. The IO speeds were better than the normal D4M speeds, but on-par with flagged volumes (:delegated). That sucks.

screen
~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty

Another issue was the setup itself. Instead of writing about it, there are already numerous blogposts talking about the setup.

In my opinion it was overly complex due to it being heavily tied to the host’s state (which we want to overcome using docker, remember the 12-f’s…) and requiring additional shell scripts to spin up. Remember, this is a thing that tens of people will use, and some of them have private projects, other company services or learning projects working on their machines. Messing with OS NFS volumes on a Mac does not play nice.

Last straw — Since it is a NFS share, you cant change owner on files served via NFS. It’s always root, which means our container’s processes can’t read from it if they are not root as well, writing hurts as well. There are workarounds, but all are really sloppy.

Verdict: Promises much, underdelivers. Burdensome to set up or automate, ownership issues.

Step 3: Unisons, Docker Sync, Docker Magic

In order to solve this issue with IO speeds, and the spaghetti around subdirectory exclusion, a couple of specialised tools have shown up during the years. At this point I believe I have tested all of them (even a custom implementation), but please share if you know any others.

Docker Sync and Docker Magic Sync are the frontrunners. Both are very similar as explained by their authors as they both employ a proxy/dummy container with an unison-managed volume. Which is then mounted to all other containers at pretty good IO speeds. DMS has concentrated on simplicity of use and large file counts support while DS is going for stability and feature-completeness. They both have a file/dir ignore mechanism.

In my experience DS works great. The ignore mechanism is a god-given and cleans up our docker-compose really nice, while the IO speeds are a miracle compared to osxfs. We are now talking about 10–40sec pageload times, 2–3min cache clears etc. There might be a slight lag (<10sec) in change-sync between host and container but who cares.

In fact I was so happy with the solution at first that I incorporated it into one of our major beta versions that was given to the team. This was a mistake, and in fact it shows really well how synthetic tests and “works on my machine” are disconnected from the workplace reality. As soon as we had 5 people trying to work their day-to-day tasks on the system I started noticing serious issues.

Conflicts: DS seems to have an insane CPU bottleneck, so when CPU is thinking, Unison starts to lose track of changes. In addition to this there is a known issue (of course) on D4M concerning a time-drift under certain conditions, which makes any Unison flags about temporal conflict resolution, useless. Also if left un-run for some time 3–4 days in my case) the next build takes forever because of forced re-indexing.

Large file counts: This one seems obvious, since CPU is the bottleneck, if you make a large change (“large”, for us, varied from 110 files to 10k files depending on the CPU) like checkout on another git branch, have a FrontEnd watcher like Gulp or Webpack running, or simply paste 100 smallish jpegs into a folder, Unison would randomly crash, or even worse, just silently stop sync. It seems that when it can’t keep up, it stops working and tries to reindex everything in the background.

Often times the answers to performance issues are “Well, don’t change git branches duh.” or something like “Your project is simply too large, we don’t care about support for that amount of files.”.

If you’ve got answers like this or this, no wonder you have questions like this ^ .

Multiple volumes: This also means that having multiple volumes is really tough on the host, as CPU usage skyrockets and we get the same speeds as the D4M native solution. I would argue that we can change our project to use less volumes, but I see that as weakness — us conforming to the constraints. This is ok for a Wordpress site, but on a grand project with multiple moving parts, you may need to have more than 10 volumes.

General optimisation tip: Unrelated to DS, you should always cut down your watch polling limit. When your assetic bundle or Gulp or Webpack perform a watch operation, they are often listening to inotify, but as DMS explains this component is costly and unreliable. Look into your watcher’s docs about how to prolong the polling interval, the often default 5ms is a silly amount of time anyways. Even if you are Stella, 500 or 1000ms work just as well… Unless you are Skynet I guess.

Verdict: Kind of complicates things, but would live with it. Conflicts and CPU usage on large file counts are insane. Community is a dictatorship.

Step 4: VirtualBox and Docker-Machine

One thing you have to understand about Docker for Mac, is that it is just one version of the virtual machine. I figured, ok, since this thing works quite normally on Linux in prod, why don’t I just spin up a Debian server or Vagrant in VMware or whatever? This led me to find out that people already did this for ages, before D4M came out, with boot2docker

Essentially, you are able to employ docker-machine utility to make a bespoke VM that will run Docker, and whaddya know, it supports a virtualbox driver. Now have in mind that they wrote all over the place about how this method is deprecated, and D4M should be used. But the golden rule is that you can’t replace an existing thing with a worse model. If this works for our particular case, why not.

docker-machine create $DOCKER_MACHINE_NAME \
--driver virtualbox \
--virtualbox-disk-size "65536" \
--virtualbox-cpu-count "2" \
--virtualbox-memory "8192" \
--virtualbox-ui-type "headless" \
|| true
eval $(docker-machine env $DOCKER_MACHINE_NAME)

Once you create the machine, all docker commands get proxied into it instead of your localhost docker daemon.

One thing I would like to point out right off the bat, I noticed immediately is the CPU drop. Literally, my CPU never went over 100% even during build phase.

Trivia, Poor man’s cluster: I lied, the first time I stumbled upon this concept was a while ago, when trying to simulate a DNS or a service mesh. VBox creates it’s own network and the host always has a static IP, so you can be dirty and easily use extra_hosts in docker-compose with 10.0.2.2 if you want the services to hit the host.

Vboxfs shares

First I tested the default way of using it. Share a folder to the machine via VirtualBox, and since vboxfs process does the sync now, it does a better job than osxfs did with D4M. CPU was down, and speed was like 50% better. Very good and kind of easy to use. (we used this approach for months btw)

That feeling when your yarn build finishes in under 60s.

NFS for Docker Machine

Then, out of curiosity I tried creating a NFS share, similarly to what we did for D4M, btw someone had made an utility for NFS mounts this way.

Boy did it work! I thought something was broken, it was that fast.

As before, it suffers from exact same problems mentioned in NFS section, but it was a good experiment. Also, fun fact, I stumbled upon this little scientific exchange.

Little did i realise that it wasn’t fast because of NFS per se. It was because I had (almost) removed the file sharing bottleneck, and CPU was now free to do its job.

Step 5-alpha: Rsync

After looking into it, the above experiment made me realise that:

It was in fact not the sheer RW speed of shared volumes that was slowing down the app. It was the CPU load that we take on while doing the sync.

feels like

To test this theory I just used no sharing. Rsync the whole app into the VM, and use volumes from there (between the VM and the container) with host not participating at all.

Lets take it a step further. Making a two way rsync play nice is a really tough job. And that doesn’t even cover optimisations that cover different OS-es, or watch events limits, or the fact that we are dealing with a local Docker. All of which use up quite a bit of CPU. So we are back at square one. But at least this time we know exactly what the issue is.

Step 5: Mutagen.io (we use it)

Luckily, I was not the first with this problem, and found out about mutagen.

Mutagen is a very simple, stabile, standardised wrapper around rsync and ssh. It handles the edge cases, allows project infrastructure specification in yaml-s, does not care if we use Docker or Cloud-Native development, it just works.

Wow. It was speedier than a Bill Gates in a Porshe.

It provides real-time file synchronization and flexible network forwarding, and works with local folders, network machines, containers or cloud services. In fact, under the hub, it even contains specific optimisations for each platform.

Tests are going great now. All tested apps work with speeds on-par with production and sync works without a hitch. We can easily write ignore matching patterns. Cache clear takes 1min, pageload is <6s and in Wordpress or Invision Community its less than 2s!

Now at this point, I was pleased with how the current setup (docker-machine, Virtualbox, vboxfs) worked. Heres a little ‘pro et contra’ at this stage:

Pros: Easily configurable. Dev/Prod parity. Works at almost native speed. Easily uses D4M. Completely decoupled from the host OS. Does not mess up other projects or D4M, can be used in parallel if needed. CPU is under control at all times.

Cons: Mutagen yaml files do not allow usage of environment variables which makes it slightly harder to make shared configurations, although support is on its way. Also, mutagen attaches files only when the container has already started. This means that you must use mutagen’s lifecycle events to start the entrypoint. Not terrible, not great.

sync:
defaults:
mode: "two-way-resolved"
flushOnCreate: true
ignore:
vcs: true
my-volume-name:
alpha: "php/src"
beta: "docker://php_example/app/src"
ignore:
paths:
- ".idea"
- "node_modules"
- "php/src/vendor"

That’s it folks

Mutagen+D4M won. In short, go visit the Github repo for some of the code samples in action.

Although this is an ever-evolving topic, and I am sure that more steps will be added (or ommited in future as D4M gets better), my team has been using this new setup for a while now, and we haven’t had any hiccups yet.

Also, if you read through all of that, and are still here. Wow, just wow.

Announcement: Over the past year, this story blew up as a first result on Google. This is Part#1 of a three part series. In Part#3 we take it a step further, and learn how to work in a container-first environment. Wherein we get full native linux performance, as well as 17 hour battery and zero fan spins.

Part#1 — Overcome performance issues with D4M.
Part#2 — Replace xhyve and build our own hypervisor with better performance.
Part#3 — Employ a container-first development environment.

--

--