Speeding up Docker builds in CI

Newton School Tech
Newton School
Published in
4 min readOct 28, 2021

Docker layered caching

Docker build caches each step in the Dockerfile as an image, when it is being built. For the subsequent builds, each layer will only be re-built if that layer or any layer above it has changed since the last build. Docker build relies heavily on this caching layer, to reduce the build time.

Let’s take an example of a shortened Dockerfile here for a python service. (Most actual steps are omitted here for brevity)

The very first Docker build for this can take several minutes to complete, most of the time being spend at Step 3 where the requirements are fetched and installed for the service from the requirements.txt file.

Docker caches each of these build steps as an image in its cache.

Now consider a scenario where we made some changes to the source code for the app. When another build is triggered, everything till the Step 4 can be reused from the cache as there are no changes till this step (Assuming requirements.txt was not changed).

A new build now will be faster than the first build, because the dependencies need not be downloaded again. The image layer that has all the dependencies downloaded (Step 4) will be used from cache. Steps 5, 6 and 7 are run on top of this layer, and would not take more than a couple of seconds to complete. This brings down the build time from several minutes to a couple of seconds for our subsequent builds.

The heavier and less prone to change steps, like installing dependencies should be done earlier in the build phase. Thus the subsequent builds can make use of caching, and not rerunning the costly steps again, to reap the full benefits from this docker layered caching mechanism.

CI Environment : AWS Codebuild

Let’s take AWS Codebuild as a CI environment for this discussion.

Whenever a build is triggered via Codebuild, it acquires a fresh on-demand EC2 instances to run the build process. Once the build is completed, and if another build is not triggered within 15 minutes, Codebuild terminates this newly acquired EC2 instance.

Due to the build being run in a fresh instance each time with Codebuild, we lose the local docker caching layer for subsequent builds. This increases the build time by orders of magnitude.

Docker in-line cache

In addition to local build cache, the docker builder can reuse the cache generated from previous builds with the --cache-from flag pointing to an image in the registry.

To use an image as a cache source, cache metadata needs to be written into the image on creation. This can be done by setting --build-arg BUILDKIT_INLINE_CACHE=1 when building the image. After that, the built image can be used as a cache source for subsequent builds.

The following example builds an image with inline-cache metadata and pushes it to a registry :

After pushing the image, the image can be used as cache source on another machine. The cache metadata enables using the image as a cache source for another build

Multi-stage Docker build

With the multi-stage builds, the intermediate stage images are discarded before the final image is created. Due to this, they can’t be used directly as inline cache for subsequent builds.

To overcome this, we need to build and push each stage of the build, and then use the images of each stage as the in-line cache for subsequent builds. The --target option can be used to build each stage of the multi-stage build separately.

Let’s take the same example of a shortened multi-stage Dockerfile here for a python service. (Most actual steps are omitted here for brevity)

We can build the builder stage separately and push it to repo using the following command.

Now final build can be done using the builder cache as an in-line cache using the following command.

The final code for using docker inline cache with a multi-stage build in Codebuild, using ECR as the container registry, is added below.

--

--