Supporting Docker-based development environments in a multi-architecture world

Published in

VoucherCodes Tech Blog

8 min readDec 12, 2022

A container ship transporting containers — Photo by Ian Taylor on Unsplash

It has been around 18 months since Apple first launched the Apple Silicon chip, and whilst this has been a particularly exciting development from a performance point of view, it did introduce some new challenges for developers around the world. At VoucherCodes, the engineering team were amongst the first to transition to the new architecture for our development team, simply because staying on the Intel bus would have been a futile exercise for us, and we’d rather get ahead of the curve.

In this article we’ll discuss our approach to supporting a development team that uses both the amd64 and arm64 architectures in their Docker development environments, as well as some of the hurdles we encountered over the last year.

Why relying on emulation is not always an option

You might be wondering; “Can’t Docker effortlessly run amd64 containers on an M1 system?”. The answer is “yes”, however in these situations it’s always good to remove as many layers of abstraction as possible, especially when that layer is emulation. Not only that, some technologies and languages must be purpose-built for a target architecture when being baked into a Docker image.

An example of the above is the Lua VM which crashes if run using an amd64 image on an ARM-based system, with the following, rather unceremonious, error;

nginx: [alert] failed to initialize Lua VM

A few notes beforehand

We are a GitLab shop at VoucherCodes, and in our existing CI/CD workflow, we trigger development image builds in a number of scenarios — most importantly when we update the desired image tag in our codebases. This ensures that we only build images when we explicitly mean to, saving on resources. These images are shared and used by all our engineers, and it’s only in rare circumstances that we require colleagues to build their own images on their local machines.

Your own CI/CD workflow may use different providers, and solutions, but the overall theme remains the same; how to smoothly transition to a workflow that supports Docker-based development environments in a multi-architecture world.

Adjusting our build pipeline for multiple architectures

As previously mentioned we already had steps in our GitLab pipeline to build images for development use when necessary. This means that our .gitlab-ci.yml configuration has a block in it that looks somewhat like this;

update-dev-docker-images:
  stage: update_dev_docker_images
  interruptible: true
  tags:
    - large
  rules:
    - if: $JOB_NAME == "update-dev-docker-images"
      when: always
    - if: $TEST_NAMESPACE || $JOB_NAME || $CI_COMMIT_TAG || $CI_PIPELINE_SOURCE == "schedule"
      when: never
    - changes:
      - .env-docker-dev-tag
      when: always
    - when: never
  script:
    - make build-docker-dev-images | ts -s
    - make push-docker-dev-images | ts -s

The configuration above shows that we use make commands to build and push the development images to a repository. From here it’s pretty straightforward to add an additional step to build the images for the arm64 architecture.

update-dev-docker-images-arm:
  stage: update_dev_docker_images_arm
  interruptible: true
  tags:
    - arm-workload-large
  rules:
    - if: $JOB_NAME == "update-dev-docker-images"
      when: always
    - if: $TEST_NAMESPACE || $JOB_NAME || $CI_COMMIT_TAG || $CI_PIPELINE_SOURCE == "schedule"
      when: never
    - changes:
      - .env-docker-dev-tag
      when: always
    - when: never
  script:
    - make build-docker-dev-images-arm | ts -s
    - make push-docker-dev-images-arm | ts -s
  needs:
    - job: update-dev-docker-images

This new step isn’t all that different from the previous one, as indicated by the diff below.

-update-dev-docker-images:
- stage: update_dev_docker_images
+update-dev-docker-images-arm:
+ stage: update_dev_docker_images_arm
interruptible: true
tags:
- - workload-large
+ - arm-workload-large
rules:
- if: $JOB_NAME == "update-dev-docker-images"
when: always
@@ -13,5 +13,7 @@ update-dev-docker-images:
when: always
- when: never
script:
- - make build-docker-dev-images | ts -s
- - make push-docker-dev-images | ts -s
+ - make build-docker-dev-images-arm | ts -s
+ - make push-docker-dev-images-arm | ts -s
+ needs:
+ - job: update-dev-docker-images

As you can see we use ARM-specific make commands, require the job to be run after the update-dev-docker-images job has finished, and we have a different tag.

Effectively the tag is where the “magic” happens. But as we all know, there’s no such thing as “magic” in engineering, and it’s simply the result of trying to work smarter, not harder. We use tags in our GitLab setup to determine which jobs a GitLab runner can run. And by tagging the update-dev-docker-images-arm job with the arm-workload-large tag, we’re only allowing ARM-based runners to perform the task at hand.

We could have configured BuildKit to build for multiple architectures on the same runner, but emulation under QEMU comes at a performance cost, and we want to free up our runners as quickly as possible. It is much more efficient to build for different target architectures on “native” runners.

Lastly, we need to create and push the manifest for the images that we’ve just built and pushed, and we use a configuration similar to the one below for this.

create-dev-docker-manifest:
  stage: create_docker_dev_manifest
  interruptible: true
  rules:
    - if: $JOB_NAME == "update-dev-docker-images"
    when: always
    - if: $TEST_NAMESPACE || $JOB_NAME || $CI_COMMIT_TAG || $CI_PIPELINE_SOURCE == "schedule"
    when: never
    - changes:
      - .env-docker-dev-tag
    when: always
    - when: never
  script:
    - make create-docker-dev-manifest | ts -s
    - make push-dev-docker-manifest | ts -s
  needs:
    - job: update-dev-docker-images-arm

For anybody who is new to multi-architecture environments, Docker manifests might be a new concept, but I can assure you it’s pretty straightforward. In most cases, and for our scenario, a manifest is a list of images that are identical in function, but for a different architecture. This is why Docker manifests are often referred to as “multi-architecture images”, as it allows people to pull a Docker image, and the correct architecture is automatically selected for them. Below you’ll find an example of how we create a multi-architecture manifest.

docker manifest create <repository>/<image>:<tag> <repository>/<image>:<tag>-amd64 <repository>/<image>:<tag>-arm64
docker manifest annotate - arch amd64 <repository>/<image>:<tag> <repository>/<image>:<tag>-amd64
docker manifest annotate - arch arm64 <repository>/<image>:<tag> <repository>/<image>:<tag>-arm64
docker manifest push <repository>/<image>:<tag>

In this example, we start off by creating a new manifest list, followed by the constituent images that we’d like to include in the manifest. We then annotate the individual images in the manifest with the correct architecture. And finally, we push the manifest to the registry. As you can see, at VoucherCodes, we’ve adopted an approach in which all the images are tagged with the desired tag I mentioned earlier, which we then suffix with the architecture. The manifest then wraps this up nicely, allowing people to pull <repository>/<image>:<tag> instead of <repository>/<image>:<tag>-<architecture>.

How the finished build steps look in GitLab.

Depending on the technologies used you may not be able to build specific images for the arm64 architecture just yet. A good example of this is the official MySQL Docker image, which only supports arm64 from version 8 and up. This is why we currently still have separate make commands for these tasks, to allow for a bit of nuance where needed.

But it’s not just the external limitations that come into play. You might be operating with a particularly lean engineering team, and the effort and cost involved with transitioning to supporting both architectures — let alone switching every developer over to an M1 system — would simply be too great to perform in one go. That’s where this two-tiered, phased approach can help.

Dealing with edge cases

I already touched on a few edge cases in this article (Lua VM, and MySQL), but as you can imagine we ran into a few more on our own transition journey.

Recently we ran into issues with running our automated test suites in a local Docker container because it was missing shared libraries that were present in the amd64 version of the image, but not in the arm64 version. The workaround for this, for us, was to drop the following line in the relevant Dockerfile;

RUN dpkg - add-architecture amd64 && apt-get update && apt-get install -y libc6:amd64

When building the images this ultimately won’t install anything in the amd64 version of our image, because the libraries are already present there, but it will install the missing standard libraries for the arm64 version of the image, albeit in the amd64 flavour.

The big benefit here is that we can still use the same Dockerfile for both architectures, which saves us from having to maintain multiple versions, and more importantly, from those multiple versions accidentally falling out of sync.

Lessons learned and main takeaways

Start with a small group of engineers who are sufficiently capable with technologies like Docker, and your CI/CD pipeline, to kick off your own transition journey. They will light the way, and help other engineers when they transition to ARM-based systems.
Avoid running amd64 architecture images on your ARM-based systems where possible. In some cases, the emulation comes with a significant performance hit, and it can really hurt the developer experience too.
Nobody in your team should be a second-class citizen for using a different architecture. Work to provide first-class support for everyone during the transition period, no matter how long it takes.
Use the same Dockerfile for both architectures. Because ultimately, supporting “one of something” is always easier than having to support “two of something”.
It’s okay if you end up with a bit of code duplication to achieve all of this initially but diligently work towards reducing that to a bare minimum.

The end goal

In the above examples we’re still using separate commands for the amd64 and arm64 image builds, but realistically you should strive — if the dependencies allow you to do so — to use the same commands, and simply have the runner architecture dictate the target architecture of your images. This reduces the overhead of the separate make commands, ensures that every image is built for both architectures, and simplifies your setup. Ultimately we’re looking to limit the cognitive load to a minimum and to keep our engineers as happy as possible.

Wrapping up

If you’ll allow me to paraphrase the overused Wayne Gretzky quote; it’s all about skating towards where the puck is going, not where it has been. It is highly unlikely that Apple will ever release another Intel-based Mac, and with cloud providers like AWS fully embracing ARM-based systems because of the significant performance benefits, it definitely is a good moment to start planning your own transition journey. Hopefully, this article has illustrated that this is not necessarily an all-hands-on-deck big-bang initiative, but something that you can gradually achieve in your organisation, as long as you keep working, steadily and diligently, towards that end goal.

Have you heard? We’re hiring at VoucherCodes! Check out our careers page here.