How to increase the speed of your AWS Lambda continuous deployment builds

Parallelize your Maven builds, make them reproducible, and crank the volume to 11 using Docker for dependency resolution

John Chapin
A Cloud Guru
11 min readAug 30, 2017

--

The benefits of continuous deployment practices unlock a world of promises. The automation of workflows into a unified pipeline significantly reduces risks while increasing the team’s overall productivity.

In comparison to the Bad Old Days™ of production rollouts, an automated end-to-end deployment time of 10 to 20 minutes is surely fast enough — right? Wrong.

These days, high-performing development team use automated workflows to deploy code much more frequently and with shorter lead times.

Amazon engineers deploy code every 11.7 seconds, on average — reducing both the number and duration of outages at the same time. Netflix engineers deploy code thousands of times per day.

Speed to market is a major competitive advantage, and every second counts. Even a small reduction in the cycle time of an automated workflow can aggregate into massive savings — especially when applied to hundreds of daily deployments.

The Challenge of Ephemeral Build Systems

In the Bad Old Days™ of continuous deployment, build systems were designed as a shared resource. They had all of the advantages of shared caching and dependency management — and all of the disadvantages of inconsistent setup, accumulated state, and poorly isolated processes.

Today’s automated workflows are likely to be using a build system that spins up and tears down infrastructure on the fly — using services such as Travis CI or AWS’ CodeBuild. This approach closely mirrors the immutable and ephemeral characteristics of cloud native app deployments.

Since modern build systems treat the infrastructure and environment as immutable, the build process bootstraps itself from scratch for every build — without the advantages of setup or optimization. This isn’t a huge penalty for simple builds, but it can drastically slow down the feedback cycle for complex builds with many external library dependencies to download.

If you’ve ever used Maven (a common Java build tool), the observation about “downloading the entire Internet” should be familiar. Java developers simply add their external libraries to a pom.xml file, and then let Maven download them along with all their dependencies into a cache.

Since the downloaded files are cached by Maven in a local repository, traditional Bad Old Days™ build systems only incur the painful download process during the initial setup. For ephemeral build systems such as CodeBuild, Maven incurs the painful download process every time the build runs since the cache can’t be accessed from one build to the next.

Since every second counts, it’s critical that we learn how to to speed up the continuous deployment process of modern build systems. In this article, we’ll examine some techniques using Docker and Amazon EC2 Container Registry (ECR) — as well as a few tricks with Maven.

The Basics of a Continuous Deployment Pipeline

Let’s quickly set the stage by reviewing a basic continuous deployment pipeline for a serverless application on AWS — using some Java-based Lambda functions and components like API Gateway and DynamoDB.

All the code for this blog can be found on the Symphonia Github site.

CodePipeline actions

Source
The continuous deployment pipeline is kicked off from our source code repository. Every time a new commit (or set of commits) is made on the master branch, it triggers AWS CodePipeline to start a new execution. In CodePipeline terms, this is referred to as the “Source” action.

Build
The “Source” action then triggers a “Build” action, which in this case is an AWS CodeBuild job. CodeBuild spins up a build container, loads the new version of our source code, and executes the build steps that are defined in the buildspec.yml file in the root of our project.

In our case that buildspec.yml file contains a Maven invocation, which executes the mvn package command. This compiles our code, executes unit tests, and then creates an uberjar for each AWS Lambda. Each uberjar contains Lambda code, and all of the dependencies that were listed in the corresponding pom.xml file.

After Maven builds the uberjar files, and still within the pipeline’s “Build” action, the aws cloudformation package command uses our Serverless Application Model (a.k.a., SAM) file to determine which uberjars should be uploaded to S3 in order to update the code in our various Lambda functions.

Deploy
After the “Build” action, two related actions create and then apply changes to our infrastructure using CloudFormation. If you’ve used the Serverless Application Model before, this is equivalent to the aws cloudformation deploy command.

And after those four steps, our continuous deployment pipeline is complete! Our Serverless application is now updated with new code, new configuration, and potentially new infrastructure.

Not so fast

For a simple serverless application with only a few Lambda functions and minimal infrastructure changes, a normal pipeline execution duration might only take a few minutes. About 25–30% of that cycle time is spent within the build step — over which we have the most control.

For a larger or more complex serverless application, just the build step could take several minutes — especially if there are several dozen Maven modules with unit tests. Each of those modules must go through the dependency resolution process, which may involve downloading libraries from Maven Central or other repositories.

Once each module is compiled, the tests are executed, and finally, the module is potentially bundled into an uberjar. Any uberjars with different MD5 checksums from the last Lambda deployment are uploaded to S3, so that the Lambda functions can be updated.

Same code, different signature
One complicating, and oft-overlooked issue with this checksum scheme is that builds are not strictly reproducible in a normal Maven build process. JAR files are simply zip archives with additional Java-specific information, and that information includes timestamps and other non-deterministic outputs.

What this means is that even with the same pom.xml file and source code, JAR files produced back-to-back will have different MD5 checksums. From the SAM perspective, that means those JAR files are new, and need to be uploaded to S3.

The opportunities to improve

During the overview of our continuous deployment pipeline, we’ve already identified three areas for improvement.

First, because our Maven build runs in a brand new CodeBuild container every time a new build is executed, it has to download all of the project’s dependencies from scratch. Not only is this unnecessary, but those HTTP calls to external repositories are slow and error-prone.

Secondly, even the lowest-spec CodeBuild container has two CPU cores. Maven isn’t taking advantage of that, so modules are compiled, tested, and packaged one after the other, serially.

Lastly, because our Maven build isn’t strictly reproducible, the aws cloudformation package command always uploads the uberjar files, even when it’s unnecessary.

So let’s dive in and discover how to speed up the build portion of our continuous deployment pipeline with two simple fixes and one complex — but worthwhile — change.

Two simple fixes

Parallelizing a Maven build is quite straightforward. Simply add the -T flag, and either a numeric value for the number of threads Maven should use, or an argument like `1C` to instruct Maven to discover the number of available CPU cores, and assign a single thread to core.

You can read more about this Maven feature for more details.

The second quick Maven fix is to set up reproducible builds. The reproducible-build-maven-plugin can be added to your pom.xml file alongside any other build plugins. It strips common non-deterministic or non-repeatable information from the artifacts that Maven produces.

This means that for the same dependencies and source code, you’ll always get the exactly the same JAR file, with the same MD5 checksum. As a result, the aws cloudformation package command won’t unnecessarily upload JAR files.

Turning it up to 11

Now we’re going to really crank up the volume, both in terms of impact to build time and, unfortunately, complexity.

At this point, we’ve already optimized the actual code building and deployment with parallel and reproducible builds. But, we haven’t dealt with the issue of dependency resolution. To address that, we’ll need some help from a true friend of serverless — Docker!

CodeBuild and the Whale
As the basis for the underlying build processes, CodeBuild uses Docker containers. By default, it uses AWS-provided containers and it doesn’t require any additional configuration or permissions.

With a little extra work, we can build our own Docker image that not only contains the Java SDK and Maven, but also all of the dependencies for our project. That way, when CodeBuild spins up the container, it already has everything necessary to build our project.

Furthermore, it’s apparent (although not necessarily documented) that CodeBuild caches the Docker images that it uses. This means that it downloads our special, larger image every once in awhile — and reuses it often and quickly.

Another benefit we get from using our own Docker image is the ability to update other software within the container. For example, the current image that CodeBuild uses for Maven projects contains an outdated version of the AWS command line interface. In order to properly support the Serverless Application Model, the AWS CLI needs to be updated. Before, we’d perform the update during the actual execution of our build process — which slowed things down. Now, we can simply bake that updated AWS CLI into the container image itself!

CloudFormation to the rescue!
If you’ve chatted with us for a few minutes at a conference, it should come as no surprise at all that our first approach to dealing with thorny infrastructure problems is to ask, “Can CloudFormation do it?” In this case, the answer is a resounding yes.

We already have our entire continuous delivery pipeline codified in a CloudFormation template, so we’ll just add the necessary components so that we can use a custom Docker image with CodeBuild.

First, we’ll add an ECR, or EC2 Container Repository. This is essentially AWS’ version of Docker hub. There’s not much to configure here, but the permissions are important — and easy to get wrong.

The CodeBuild service itself will be pulling images from the repository, so we need a policy to that effect. Note that this is a resource policy — we’re setting these permissions on the AWS resource itself, not on an IAM role or user.

Next, we’ll reconfigure the CodeBuild section of our CloudFormation template to use a custom Docker image, replacing the aws/codebuild/java:openjdk-8 value that was there before.

We’re using the !Sub CloudFormation intrinsic function here to dynamically generate the name of the Docker image. We’ll also export this value so we can easily access it from outside of CloudFormation later.

With those two changes made, we can update our build pipeline stack:

And now we need to construct our custom Docker image, containing an updated AWS CLI, the Java SDK, Maven, and of course, all of the Maven dependencies for our project.

Here’s the Dockerfile that we’ll use to build our custom image:

There’s a lot going on here, so let’s start at the beginning and explain each item.

  1. First, we’re using the openjdk:8-jdk image as a base. This will give us the most recent version of OpenJDK 8 (a.k.a., Java 8).
  2. Next, we use apt to update the system, and install the latest version of the AWS CLI.
  3. The next several lines download and install Maven v3.5.0.
  4. Finally, and most importantly, we ADD our project code to the container and execute mvn verify clean. This process downloads all of the Maven dependencies that our project needs, and stores them within the Docker image itself. This is opposed to using an external VOLUME, as done by most pre-packaged Docker / Maven images.

Given that Dockerfile, we can build an image locally using the docker build . command. When the image finishes building, we need to tag it with the same image name we used in the CodeBuild configuration section of our CloudFormation file.

First, let’s grab the CodeBuild image name from CloudFormation:

Now, given output that looks like this from the `docker build .` command:

We can tag the new image like this:

Before we can push the image to ECR, we need to authenticate our local Docker to the remote repository. We can do that using the follow AWS CLI command, which itself generates a Docker command:

Copy and paste the returned Docker command, to authenticate to ECR:

And now, we can push our custom Docker image to the repository, so it can be used by CodeBuild (this may take a while, given the size of the Docker image):

We performed these steps manually, but unsurprisingly this can be set up as a separate CodePipeline pipeline, to automatically build Docker images for use by CodeBuild. We have a CloudFormation template that sets this up for you, here.

Reaping the rewards

With these steps complete, we can kick off our continuous deployment pipeline with a commit to our source repository. Now, of course, the first execution of our pipeline is actually going to take longer. This is because CodeBuild is downloading our custom Docker image, which is larger than the standard AWS-provided image.

However, subsequent runs of our pipeline will be faster, because the CodeBuild portion of the pipeline will be running 50–60% faster! For my simple two-Lambda Maven project, this means an improvement of 30–40 seconds.

For a more complex project, that could be minutes saved, every time your continuous deployment pipeline runs. With a few deployments a day across a few different projects, the savings adds up — and that means less time waiting for builds to finish and a shorter feedback cycle!

Need help with Lambda, or other Serverless technologies? We’re the experts! Contact us at Symphonia for expert advice, architectural review, training and on-team development.

--

--