Streamline Your Azure DevOps Pipelines: Advanced Docker Optimizations Unveiled

Published in

The Mindbody Dev Report

7 min readApr 4, 2024

At Mindbody, managing over 4,400 Azure DevOps pipelines is a cornerstone of our software delivery process. With such an extensive array of pipelines, even minor inefficiencies can lead to major setbacks, making the optimization of these pipelines — particularly through Docker’s containerization technology — not just helpful but essential.

For a setup as extensive as ours, fine-tuning Docker within our pipelines is not merely about keeping up — it is about setting the pace in a competitive tech landscape, ensuring we deliver superior, secure, and scalable applications faster than ever. Four years ago, we centralized our pipeline templates, which was a decision that significantly accelerated our ability to implement changes across our extensive network of pipelines.

In this post, we are excited to unveil the advanced Docker optimizations that have significantly sped up our build processes. Our journey through this demonstration will involve a straightforward Next.js project, highlighting how these enhancements can be integrated into an Azure DevOps Pipeline to streamline development.

Let us dive into the Dockerfile we will be using as our starting point.

# Start with the base image
FROM node:21-slim

# Set environment variables
ENV PNPM_HOME="/usr/local/pnpm"
ENV PATH="$PNPM_HOME:$PATH"

# Enable corepack for package management
RUN corepack enable

# Copy your application files into the image
COPY . /app

# Set the working directory inside the container
WORKDIR /app

# Cache and install dependencies
RUN pnpm install --frozen-lockfile

# Build the application
RUN pnpm run build

# Expose the port your app runs on
EXPOSE 3000

# Define the command to run your app
CMD ["pnpm", "start"]

Here is the Azure DevOps Pipeline Definition without any optimizations.

name: $(BuildDefinitionName)_$(SourceBranchName)$(Rev:.r)

variables:
  DOCKER_BUILDKIT: 1
  TAG: '$(Build.BuildNumber)'
  REPOSITORY: '<org/repository>'
  REGISTRY: '<ado_registry>'

trigger:
  - main

pool:
  vmImage: 'ubuntu-latest'

stages:
  - stage: build
    displayName: 'Main Build'
    jobs:
      - job: build
        displayName: 'Build'
        steps:
          - task: Docker@2
            displayName: 'Login to Docker Registry'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'login'

          - task: Docker@2
            displayName: 'Build'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'build'
              repository: '$(REPOSITORY)'
              dockerfile: '**/Dockerfile'
              tags: |
                $(TAG)
                latest
                
          - task: Docker@2
            displayName: 'Push'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'push'
              repository: '$(REPOSITORY)'
              dockerfile: '**/Dockerfile'
              tags: |
                $(TAG)
                latest

          - task: Docker@2
            displayName: 'Logout from Docker Registry'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'logout'

Optimizations on the Dockerfile

Parallel Processing with BuildKit

Docker BuildKit is an advanced build toolkit designed to provide improved performance, storage management, and security for Docker image builds. While there are several benefits to using BuildKit, the most useful one is its ability to parallelize building independent build stages.

BuildKit is available in ubuntu based VMs for Azure DevOps. These virtual machines use Docker 24.x which by default has BuildKit enabled. If you are running a self-hosted agent with Docker Engine that is below 23, then you can enable this with ` DOCKER_BUILDKIT=1` as a global variable. BuildKit is not yet available for Windows based Agents.

Leveraging multi-stage builds for efficiency

The main goal here is to use several FROM statements in our Dockerfile to set up different build stages. Each stage is designed for a specific job, such as getting dependencies ready or putting together the final product.

By splitting these tasks into stages, we keep the final image clean and small because we leave out any tools or extra files that we do not need anymore. This way, the build process not only gets faster but also makes smarter use of cache. Next, we will show you a multi-stage Dockerfile for a Next.js application that follows this process.

# 1. Base Image
FROM node:21-slim AS base
ENV PNPM_HOME="/usr/local/pnpm"
ENV PATH="$PNPM_HOME:$PATH"
RUN corepack enable
COPY . /app
WORKDIR /app

# 2. Stage that downloads the build dependencies
FROM base AS build-base
RUN pnpm install --frozen-lockfile

# 3. Stage that installs the runtime dependencies
FROM base AS runtime-dependencies
RUN pnpm install --prod --frozen-lockfile

# 4. Stage that builds the application
FROM build-base AS build
RUN pnpm run build

# 5. Stage that copies over the runtime dependencies and the built application
FROM base
COPY --from=runtime-dependencies /app/node_modules /app/node_modules
COPY --from=build /app/.next /app/.next
EXPOSE 3000
CMD [ "pnpm", "start" ]

It also makes the building part quicker because it can reuse steps that have not changed since the last time. Here is how the build plan changes for this multi-layered Dockerfile.

Use Dedicated Run cache

Our multi-stage build process is designed to maximize efficiency when working with package managers like npm, yarn, or pnpm. By reusing packages downloaded in one layer in subsequent ones, we can reduce build times and improve performance.

Next, we will update our Dockerfile’s pnpm commands to leverage the — mount=type=cache option, further streamlining our build process.

# 1. Base Image
FROM node:21-slim AS base
ENV PNPM_HOME="/usr/local/pnpm"
ENV PATH="$PNPM_HOME:$PATH"
RUN corepack enable
COPY . /app
WORKDIR /app

# 2. Stage that downloads the build dependencies
FROM base AS build-base
RUN pnpm store path
RUN --mount=type=cache,id=pnpm,target=/usr/local/pnpm/store/v3 pnpm install --frozen-lockfile

# 3. Stage that installs the runtime dependencies
FROM base AS runtime-dependencies
RUN --mount=type=cache,id=pnpm,target=/usr/local/pnpm/store/v3 pnpm install --prod --frozen-lockfile

# 4. Stage that builds the application
FROM build-base AS build
RUN pnpm run build

# Stage that copies over the runtime dependencies and the built application
FROM base
COPY --from=runtime-dependencies /app/node_modules /app/node_modules
COPY --from=build /app/.next /app/.next
EXPOSE 3000
CMD [ "pnpm", "start" ]

It is key to use the right target folder to mount where the package manager stores the files. For `pnpm` you can find this information by running `pnpm store path`.

Here is the comparison of including — mount-cache before (red) and after (cyan) in the Dockerfile. Clearly, a lot more packages are being reused instead of being downloaded over again.

Performance improvements over Docker cache

Optimizations on the Azure DevOps Pipeline

Our Dockerfile is now fine-tuned for peak performance. Next, let us shift our attention to the Azure DevOps Pipeline. We will concentrate on leveraging Docker’s caching capabilities to expedite the build process, ensuring swift and efficient pipeline execution.

Container Registry cache

For streamlined multi-stage Docker builds, we have a straightforward and effective solution that integrates seamlessly into your current setup.

First, we will add a step to pull the latest Docker image. This ensures that we have the most up-to-date layers available for caching, which is crucial for the next step in the process.

# Image needs to be pulled and available in the agent for us to effectively use --cache-from
- task: Docker@2
  displayName: 'Pull'
  continueOnError: true
  inputs:
    containerRegistry: $(registry)
    command: 'pull'
    arguments: '$(REPOSITORY):latest'

Next, we incorporate the BUILDKIT_INLINE_CACHE=1 build argument. This argument directs the build process to create the Docker container in a way that optimizes the use of cache in subsequent builds.

Finally, to capitalize on the caching mechanism, we will specify the — cache-from option during the build process. This points to the latest container image, allowing the current build to efficiently utilize layers from previous builds which significantly reduces build time and resource consumption.

# BUILDKIT_INLINE_CACHE instructs to build for future cache,
# --cache-from enables us to reuse layers from the current build
arguments: '--build-arg BUILDKIT_INLINE_CACHE=1 --cache-from=$(REPOSITORY):latest'

Here is the full pipeline definition after these changes:

name: $(BuildDefinitionName)_$(SourceBranchName)$(Rev:.r)

variables:
  DOCKER_BUILDKIT: 1
  TAG: '$(Build.BuildNumber)'
  REPOSITORY: '<org/repository>'
  REGISTRY: '<ado_registry>'

trigger:
  - main

pool:
  vmImage: 'ubuntu-latest'

stages:
  - stage: build
    displayName: 'Main Build'
    jobs:
      - job: build
        displayName: 'Build'
        steps:
          - task: Docker@2
            displayName: 'Login to Docker Registry'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'login'

          # Image needs to be pulled and available in the agent for us to effectively use --cache-from
          - task: Docker@2
            displayName: 'Pull'
            continueOnError: true
            inputs:
              containerRegistry: $(registry)
              command: 'pull'
              arguments: '$(REPOSITORY):latest'

          - task: Docker@2
            displayName: 'Build'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'build'
              repository: '$(REPOSITORY)'
              dockerfile: '**/Dockerfile'
              # BUILDKIT_INLINE_CACHE instructs to build for future cache,
              # --cache-from enables us to reuse layers from the current build
              arguments: '--build-arg BUILDKIT_INLINE_CACHE=1 --cache-from=$(REPOSITORY):latest'
              tags: |
                $(TAG)
                latest

          - task: Docker@2
            displayName: 'Push'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'push'
              repository: '$(REPOSITORY)'
              dockerfile: '**/Dockerfile'
              tags: |
                $(TAG)
                latest

          - task: Docker@2
            displayName: 'Logout from Docker Registry'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'logout'

Azure DevOps Agent cache with Docker@ tasks

Another option is to use Azure DevOps Agent cache. Implementing a Cache@2 task within your Azure DevOps Pipeline can boost efficiency. This approach involves saving the Docker image to durable storage and then restoring it onto each agent right before the build starts.

This method is especially beneficial if you are dealing with large Docker images or if you encounter network bottlenecks while loading cache directly from the repository. Below, we provide a detailed example of how to set this up for optimal performance.

name: $(BuildDefinitionName)_$(SourceBranchName)$(Rev:.r)

variables:
  DOCKER_BUILDKIT: 1
  TAG: '$(Build.BuildNumber)'
  REPOSITORY: '<org/repository>'
  REGISTRY: '<ado_registry>'
  CACHE_FOLDER: $(Pipeline.Workspace)/docker

trigger:
  - main

pool:
  vmImage: 'ubuntu-latest'

stages:
  - stage: build
    displayName: 'Main Build'
    jobs:
      - job: build
        displayName: 'Build'
        steps:
          - task: Docker@2
            displayName: 'Login to Docker Registry'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'login'

          # Cache task that caches that mounts the cache on to the agent before build starts and unmounts after build completes
          - task: Cache@2
            displayName: Cache task
            inputs:
              key: 'docker | "$(Agent.OS)" | Dockerfile'
              path: $(CACHE_FOLDER)
              restoreKeys: 'docker | "$(Agent.OS)"'
              cacheHitVar: CACHE_RESTORED
              
          # Docker script to load from previously saved cache file if available
          - script: |
              docker load -i $(CACHE_FOLDER)/cache.tar
            displayName: 'Docker restore'
            condition: and(not(canceled()), eq(variables.CACHE_RESTORED, 'true'))

          - task: Docker@2
            displayName: 'Build'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'build'
              repository: '$(REPOSITORY)'
              dockerfile: '**/Dockerfile'
              # Use the local cache available
              arguments: '--cache-from=type=local,src=$(CACHE_FOLDER)'
              tags: |
                $(TAG)
                latest

          - task: Docker@2
            displayName: 'Push'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'push'
              repository: '$(REPOSITORY)'
              dockerfile: '**/Dockerfile'
              tags: |
                $(TAG)
                latest

          # Docker script to save to cache disk
          - script: |
             mkdir -p $(CACHE_FOLDER)
             docker save -o $(CACHE_FOLDER)/cache.tar $(REPOSITORY):latest
            displayName: 'Docker save'
            condition: and(not(canceled()), not(failed()), ne(variables.CACHE_RESTORED, 'true'))

          - task: Docker@2
            displayName: 'Logout from Docker Registry'
            inputs:
              containerRegistry: $(REGISTRY)
              command: 'logout'

Between these two strategies, we recommend trying out the container registry cache first for its simplicity and ease of maintenance. Reserve the use of the Cache@2 task for those scenarios where you are dealing with exceptionally large image sizes.

We hope you have found this post valuable. Please share how you have been optimizing your Docker builds in the Azure DevOps pipeline — your insights could be incredibly helpful to the community!