Improving performance for Docker on Mac computers when using named volumes
--
As a senior software developer at Netresearch, Benjamin has been supporting Magento, Akeneo and Docker projects since 2016. Even beyond projects, he researches in these areas in order to be able to advance future projects in the best possible quality.
At Netresearch, Docker is a long established tool to provision and deploy our applications likewise to production systems as well as to the local systems of our developers. We’ve been using it for about four years now. Since we are expanding our services with the Blugento ShopFactory, Docker is a crucial component for our automated processes, like testing environments or setting up the same system multiple times for scaling the application. This eliminates many of the problems which can occur if the environment on the production system differs even slightly from the development system.
Right now, we are focussing our efforts on making Docker easy to use on a daily basis for backend, frontend, and fullstack developers as well as non-developers. We want to provide a fast and user-friendly environment for every operating system. During our work, we noticed that the Docker performance on Mac computers — using the same setup — is considerably worse than on Windows and Linux computers.
Background of this research
As mentioned in the introduction section, we noticed a significantly increased loading time for all processes within Docker setups when the source files for the application were mounted to the host system. This is mostly done by the developers who need to connect their local XDebug with the application to find errors or test the behavior of specific parts within the application. This is also shown in the table in “Docker Performance Tests”.
What do we use and why?
Our project’s images contain precompiled and prebuilt sources to speed up the start of the applications, and to ensure that the source files are identical on every system. If something went wrong or if somebody wanted to install the project locally, we needed a way to copy those files to the host system with the ability to make live changes to the system. For this reason, and since we don’t know the OS of the specific user, we looked for for a way which works independently of the OS. At first, we built the sources locally on our own computers and mounted them into the containers. This approach worked and with the “:delegated” flag the performance was good as well. The downsides were that you needed a second, Mac-only configuration file to add the “:delegated” flag to the volumes, and that you needed to copy the sources — ideally via an automated process — to the outside of the container, in order to not build it twice.
The next approach was using SSHFS. This allowed us to connect the sources from a “developer container” equipped with a SSH server to the host and thus make them accessible. However, we had some problems getting it to run under Windows, because on Windows, the virtual machine and the port required for SSHFS could not be connected at the same time. The second downside was the usability. If you restarted your container or your computer without closing the connection first, you had to kill the (dead) SSHFS connection and reconnect the SSHFS directory, since it had an invalid state after the restart process. This turned out to be more and more annoying during daily use.
Our current approach is using things which are included in docker-compose and Docker itself, the latter providing a more shell-like syntax. We share the sources within our services via “Named Volumes” and use a local driver to decide where the volume content should be bound to. This solution and our general structure can be seen here. It generally works for all three systems, aside from the performance issue which I want to solve in this research.
Docker environments
When starting with Mac and Docker, you have to decide how you want to use Docker in general. At the time of writing, there are two options which I would now like to discuss in more detail.
Docker Toolbox
Docker Toolbox is the classic approach to use Docker on a Mac system. It uses VirtualBox as a virtual machine in which you run your applications. With this approach, you might have difficulties with permissions and more complex applications, but it is older and more stable.
Docker Desktop for Mac (formerly known as Docker for Mac)
Docker Desktop for Mac uses HyperKit which is ultimately also a virtual machine, but more lightweight and “Mac-native”. This approach is faster and you can act independently of a “real” virtual machine where you have to link your Docker executables across your system. But for more complex applications you might suffer from port restrictions. Quote from the official docs:
“Also note that Docker Desktop for Mac can’t route traffic to containers, so you can’t directly access an exposed port on a running container from the hosting machine.”
“If you do need multiple VMs, such as when testing multi-node swarms, you can continue to use Docker Machine, which operates outside the scope of Docker Desktop for Mac. See Docker Toolbox and Docker Desktop for Mac coexistence”.
I am not sure about the technical background, but I was able to run and expose an nginx service via port 80 and 8080, and both worked. If some of you are interested in this topic and want to check this, or if you know already the “real” restriction, please tell me.
For this research, I will try to run an application via Docker Desktop for Mac to avoid issues between my system and VirtualBox.
Docker performance tests
As a test application, I chose Akeneo, because it is more lightweight than Magento regarding the amount of files and dependencies of the system.
Since I use Linux and thus apparently have the least problems and best performance of all three systems, I will use it as the baseline for this comparison.
Explanation
These are the different scenarios:
- Time for starting the Docker setup (docker-compose up -d) without sample data
- Time for the first page load. Here I will use the app-dev.php since Symfony delivers some debug tools along with the page
- Average time to navigate through the pages (average of 7 calls)
- Call of the cache:clear and cache:warmup commands
- Restarting the setup (docker-compose down && docker-compose up -d)
- Time for the first page load
These are the environments where the tests were run:
- Ubuntu system (baseline) with all development services, exposing port 80 directly without any kind of proxy, with sharing volumes for the app, appData and database via named volume mounts
- Mac with all development services, exposing port 80 directly without any kind of proxy, but without any shared volumes
- Mac with all development services, exposing port 80 directly without any kind of proxy, with sharing volumes for the app, appData and databases via named volume mounts
- Mac with all development services, exposing port 80 directly without any kind of proxy, with sharing volumes for the app, appData and databases via named volume mounts and the delegated flag
- Mac with all development services, exposing port 80 directly without any kind of proxy, with sharing volumes for the app, appData and databases via docker-sync
- Mac with all development services, exposing port 80 directly without any kind of proxy, with sharing volumes for the app, appData and databases via mutagen.io
Docker setup
After those pure facts, I would like to describe the setup for the individual environments in more detail. If one scenario is what you want to achieve, you can duplicate it. I used our internal Akeneo setup for the docker-compose files.
Default setup preparations
- Prepare the Docker system with Akeneo, a database, and a webserver with PHP
- Add the local address (akeneo.local) to your hosts-file
- HINT: If you have multiple docker-compose files, you can add this ENV variable to omit the necessity to call the “up” command with all files as parameters. Otherwise, you would have to explicitly specify all YAML files to boot. For example: COMPOSE_FILE=docker-compose.yml:docker-compose.override.yml
- Pull the images: docker-compose pull
- Start Akeneo: docker-compose up -d
- Wait until your system is ready
- See the result in your browser at http://akeneo.local
Mac setup without shared volumes
This is the easiest setup. Just follow the instructions and you’re good to go.
Advantages
This setup is as fast as the Ubuntu solution.
Disadvantages
There is no easy way to access and change the files as a developer.
Mac setup with shared volumes via named volume mounts
For this environment, you can use the previous one, but you must do the following BEFORE you start the setup:
Un-comment the mount options in the upper part of the docker-compose.override.yml and add the Elastic Search database. The upper part should look like this:
volumes:# $PWD is only available on Unix-based systems (Linux, MacOs), Windows might need absolute pathsakeneo:
driver: local
driver_opts:
type: none
device: ${PWD}/mnt/src # absolute path ONLY
o: binddata:
driver: local
driver_opts:
type: none
device: ${PWD}/mnt/assets # absolute path ONLY
o: binddatabase:
driver: local
driver_opts:
type: none
device: ${PWD}/mnt/database # absolute path ONLY
o: bindesDatabase:
driver: local
driver_opts:
type: none
device: ${PWD}/mnt/esDatabase # absolute path ONLY
o: bind
Create the directories you want to use (like mnt/*). Add the following environment variable to the end of the .env file, since the copy process will take a lot longer than the default 60 seconds: COMPOSE_HTTP_TIMEOUT=500
Advantages
The permissions aren’t an issue. If you change and add ACLs for your user ID so that you can access files with the user ID expected within your containers, Hyperkit seems to work with that rather well. I was able to change my local user for files in my IDE, and inside the developer container they still belonged to www-data. Nice handling.
Disadvantages
It is about ten times slower than the Ubuntu or the non-shared setup.
Mac setup with shared volumes via named volume mounts using the delegated flag
This approach failed, because if you use the delegated flag together with the volume mounts, Docker for Mac searches the files in another directory and cannot find them:
ERROR: for base-package_setup_1 Cannot start service setup:
OCI runtime create failed:
container_linux.go:345:
starting container process caused "process_linux.go:430:
container init caused "rootfs_linux.go:58:
mounting "/var/lib/docker/volumes/base-package_akeneo/_data"
to rootfs "/var/lib/docker/overlay2/053b184a8cc131e322e5bf6d4bbf58aff0030bd33984eb79e80e280379935ee4/merged"
at "/var/lib/docker/overlay2/053b184a8cc131e322e5bf6d4bbf58aff0030bd33984eb79e80e280379935ee4/merged/var/www/html"
caused "no such file or directory""":
unknown
Not sure if there is a workaround, but after reading the above articles, I do not expect miracles. So I skipped this in favour of the next two approaches.
Mac setup with shared volumes via docker-sync
Before we can configure our setup, we must set up docker-sync. Install it via https://docker-sync.readthedocs.io/en/latest/getting-started/installation.html#installation-osx.
I use Unison as the sync tool, just out of gut feeling and because of the impression I had reading through the articles. It seems to have better sync methods for deleted files and more watch capabilities than rsync. Based on the documentation, I will use the native OSX sync strategy:
“Native-OSX is a combination of two concepts, OSXFS only and Unison together.”
Now you have different ways of implementing docker-sync into your project. To keep the setup simple for my colleagues, I tried the docker-compose approach provided by docker-sync itself. I added a new docker-compose file into the project: docker-compose.docker-sync.yml.
I then added the new file at the end of my COMPOSE_FILE variable in my .env file, so it looked like this:
COMPOSE_FILE=docker-compose.yml:docker-compose.override.yml:docker-compose.network.yml:docker-compose.docker-sync.yml.
In my compose file, I synced the /var/www/html directory and my /var/akeneo directory to work with the system and check the logs and history of processes:
##################################################################### Addition to Build an Akeneo image with Docker sync to share the core application with the host machine
# dependencies within the docker compose file
# Possible usage:
# docker-compose -f docker-compose.yml -f docker-compose.network.yml -f docker-compose.docker-sync.yml up -d####################################################################version: '3.5'
services:
# Service to actual synchronize the named volume and the given path
akeneo-syncer: # sync the akeneo named volume
image: eugenmayer/unison:2.51.2.1
command: /entrypoint.sh supervisord
volumes:
- ./mnt/src:/host_sync
- akeneo:/app_sync
environment:
# These variables control which directories are synced by Unison.
- HOST_VOLUME=/host_sync
- APP_VOLUME=/app_sync
- UNISON_SRC=/host_sync
- UNISON_DEST=/app_sync
- UNISON_DIR=/data
# IMPORTANT: Use the user’s ID here which uses the shared directory inside of the container
- OWNER_UID=1001
# NEAT: Add directories to ignore here. "node_modules" are many files without a debugging or developing purpose
- UNISON_ARGS=-ignore='Name node_modules' -prefer /host_sync -numericids -auto -batch
- UNISON_WATCH_ARGS=-repeat watch
- TZ=Europe/Berlin
- LANG=C.UTF-8
- HOME=/rootdata-syncer: # Sync the data named volume
image: eugenmayer/unison:2.51.2.1
command: /entrypoint.sh supervisord
volumes:
- ./mnt/assets:/host_sync
- data:/app_sync
environment:
# These variables control which directories are synced by unison.
- HOST_VOLUME=/host_sync
- APP_VOLUME=/app_sync
- UNISON_SRC=/host_sync
- UNISON_DEST=/app_sync
- UNISON_DIR=/data
# IMPORTANT: Use the user’s ID which uses the shared directory inside of the container
- OWNER_UID=1001
# NEAT: Add directories to ignore here. "cache" are many files without a debugging or developing purpose
- UNISON_ARGS=-ignore='Name cache' -prefer /host_sync -numericids -auto -batch
- UNISON_WATCH_ARGS=-repeat watch
- TZ=Europe/Berlin
- LANG=C.UTF-8
- HOME=/root
NOTE:
Despite the hint from the documentation, we do NOT use the :nocopy suffix because it would break our current volume/container logic. See this Image for a better understanding of the reason:
Image of nocopy Workflow: https://cloud.githubusercontent.com/assets/1525937/25767468/574d9570-31ae-11e7-8886-d7923fbc68fb.png
Advantages
This approach is easy to implement. The boilerplates give many examples on how docker-sync can be configured: https://github.com/EugenMayer/docker-sync-boilerplate. Even with plain default settings, I have achieved almost the same performance as on the Ubuntu system. With more experimenting and tweaking or playing around with the sync strategies, I think you can get an even faster synchronization.
Excluding directories from the volume is a neat feature. You can speed up your system if you exclude (for your host system) useless directories like the node_modules, which saves index space for docker-sync and your IDE.
Mapping the user between the container and your system works out of the box and simplifies the process.
Everything can be configured via a yaml file or as ENV variables
One note about the time in brackets in the table:
One request — in particular, one of the first — took way longer than the rest, so maybe it was a spike. Ignoring it, I reached the other average of 300 ms per content load.
Disadvantages
Starting the environment takes about twice as long as the baseline reference. Especially the CSS generation takes time, but not much more than calling it directly in your running system. Maybe this is related to the sync process indexing the files or something.
One service can only sync one volume, so you need one service per volume. With an increasing number of services, the performance will get worse, depending on the amount of files you are synchronizing. You can still go a hybrid way, though. For example, you can mount the database volumes as named volumes, like in the previous section, and only use docker-sync for the main volume for development and debugging.
Mac setup with shared volumes via Mutagen IO
Before we can utilize this, we must install Mutagen: https://mutagen.io/documentation/introduction/installation/.
Add the daemon to our user, as described in the CLI command line help: mutagen daemon register and mutagen daemon start.
Start Akeneo without named volumes, like in the first Mac environment. When the system runs, just start your listener to sync the files between your system and the container. For Akeneo without the node_modules directory, it takes about 42 s until the watcher is ready to watch.
mutagen sync create — name=akeneoSrc -m=two-way-resolved — symlink-mode ignore -i node_modules \docker://www-data@base-package_devbox_1/var/www/html/ mnt/src \&& mutagen sync monitor
The logic behind this call is the following:
The first path “docker://…” is “alpha” and “mnt/src” is “beta”. Depending on your sync mode, both paths will be equally synced or not, which can lead to conflicts. With the parameter “-m=two-way-resolved” alpha always wins. That is why the Docker container containing the files should be alpha here.
You can run as many syncs as you want and keep track of them via the name: “–name=akeneoSrc”. In the default mode, Mutagen apparently handles symlinks not very well, so for now I set it to ignore them: “ — symlink-mode ignore”. Otherwise, it won’t find the redirections and will not work.
The “alpha” path contains three elements: “docker://www-data@” is the way to use Docker remotes for the sync. The “www-data” part is the way of telling Mutagen to copy and handle all files as this specific user. Otherwise, it will use the default user of the container. “base-package_devbox_1” is the name of the container. With some bash/grep logic you could also find a more automated way for make-files to call a specific container. The last part “/var/www/html/” is the path within the container which should be shared.
As with docker-sync, we can exclude directories which should not be synced: “-i node_modules”. Since one line that long might be inefficient and error-prone, you can define a mutagen.yml in the same directory and use that instead of putting everything in this single-line-command.
NOTE:
The default mode for synchronized files is 700 instead of the project-specific 644+xS. You can also define the file mode via a parameter or config file, otherwise after changing something in your IDE, it changes the file mode to 700.
Advantages
Mutagen is a tool with the capabilities to synchronize multiple systems in essentially real-time. It is independent of your setup and subsequent processes, so the performance should not be affected, as proven in the table on the top of this page. You can also define many small configurations like directory excludes and so on. You can sync the local and remote system at the same time and therefore orchestrate multiple systems at the same time or publish bug fixes to any system you have access to. You also have forward capabilities, which in theory could replace the proxy. But I haven’t tested that yet.
Disadvantages
It is more difficult to get started, because there aren’t many examples for a docker-compose setup like we have it. Also, despite the popularity Mutagen seems to have (after speaking with some agencies at conferences), I can’t find much on the internet about Mutagen and Docker.
Conclusion
With the use of docker-sync or Mutagen it should be possible to provide the same structure and logic for a debuggable and production-ready Docker environment independent of the OS. While docker-sync is the easiest solutions, Mutagen delivers the best performance and at first glance the most fields of use, not only providing Mac users a better performance than with named volumes.
Sources
These are the sources I used for finding the differences between the tools mentioned:
- https://docs.docker.com/docker-for-mac/docker-toolbox/
- https://blog.rocketinsights.com/speeding-up-docker-development-on-the-mac/
- https://stories.amazee.io/docker-on-mac-performance-docker-machine-vs-docker-for-mac-4c64c0afdf99
Learn more about Netresearch here.