by Kender Elford
First, A Little Bit of Background
For some time now, my team has been well versed in building JVM-based services that run in EC2. We adopted continuous delivery years ago and are now comfortable deploying our software in a canary style, which has saved our bacon on several occasions. Recently, we’ve begun to dip our toes into newer AWS technologies, including Lambda and Amazon SQS FIFO (First-In-First-Out) queues. We have a long history of using SNS (Simple Notification Service) to deliver events to our various pipelines, but Amazon SNS lacks the ability for a subscription to write a “group id” for FIFOs. We felt this was a good place for us to start experimenting with Lambdas. We ended up building a re-usable AWS Lambda component to take care of these subscriptions for us that can be deployed using our existing deployment infrastructure.
A piece of the puzzle was still missing for us was: How were we going to address canary deployments? Here, I’ll share how we solved this.
Canaries for Lambda Functions?
If you’ve been practicing continuous delivery for your AWS EC2 based services, you’re probably already deploying changes as canaries. In order to limit potential impact to your customers, you likely send a portion of your traffic to new code. This same process can be applied to Lambda Functions.
They are based on two abstractions over the
Alias. As presented in the AWS console, these can be a little confusing. Both appear under the “Qualifiers” menu.
Version is basically exactly what it sounds like: a historical record of a code artifact. By default, whenever you upload new code to a function, it’s applied to a special
Version called “$LATEST”. A new
Version can be created by selecting the $LATEST version from Qualifiers, then selecting “Publish new version” from the “Actions” menu.
Alias abstraction allows you to create a name that refers to one or two
Versions. There is a default
Alias called “Unqualified,” that refers to the $LATEST
Version. When you select two
Versions for an
Alias to use, you can also now specify how much traffic the second one will receive from its event source (e.g.: Amazon SNS, Amazon Kinesis, API Gateway, etc.).
An important aspect to understand about
Aliases is that any of them can be configured to receive events, and they all operate independently from each other. For our use case, we create a single
Alias to use for our production canary and only configure to receive events. Make sure that none of your
Versions, nor the Unqualified
Alias, are configured to receive events.
Once you get to the point of understanding how
Aliases work, setting up a canary in the console is pretty self-evident:
- Select your production canary
- Pick the stable
Versionin the first box
- Pick the canary
Versionin the second box
- Select the traffic distribution.
- Click Save
- Check it out in the logs! You’ll see the
Versionin the log stream name, in the square brackets.
Automate this Business!
On our team the AWS automation tool of choice is AWS CloudFormation. This allows you to declare resources and their configurations however you want and lets AWS figure out how to make that happen.
To make our usage of CloudFormation repeatable, we use Python and Troposphere with our deployment framework and reusable recipes. We have a re-usable Lambda Function for connecting an SNS topic to a SQS FIFO and a recipe for deploying, configuring and canarying it.
CloudFormation has three types of resources for deploying Lambda Functions in a canary style:
Function— This is where the “code” for your function is maintained. What “code” means in this case depends on the runtime type for your function. In the case of Java, the “code” is a reference to a .jar file in S3, that contains all of the assembled .class files and resources to execute in the JVM. When this resource is created for the first time, or updated with new code any subsequent time, the “$LATEST”
Versionis automatically updated, like in the UI.
Version— A version can be created or removed, but not updated. Whenever it’s created, it becomes a “copy” of whatever “$LATEST” is currently. CloudFormation is smart enough to know that an update to the
Functioncomes first. The version also allows you to specify a base64 encoded sha256 signature of the expected function code, so you can’t accidentally create a version of something you didn’t expect (e.g. create a new
Versionwithout updating the
Functionor upload the wrong artifact to the
Function, etc.). CloudFormation will automatically create incrementing
Versionnumbers for you.
Aliasis an updatable
Resourcewith a name, a
RoutingConfiguration,which may contain a
VersionWeightthat describes how much traffic is routed to your canary.
Here’s a diagram that describes how all of these
Resources interact with one another:
Putting This All Together, In Practice
So, now that you understand what CloudFormation Resources to use and how they will behave, what does the process look like for working with Lambdas?
Deploy the Initial Version
So, you have your code ready to go.
- Upload your code artifact to S3 so that it can be referenced by your Function in your CloudFormation template. Make sure you capture the SHA256!
- Create your CloudFormation template with a
Alias. Make sure to name your
Aliassomething obvious, like “production-canary.” The
Aliasis only going to reference the single
Versionthat you’ve created.
- Create your CloudFormation stack with that template, and wait for all of the resources to be created.
- Use your method of choice to subscribe your Lambda Function’s “canary” Alias to your event source. We typically use the console for this step to “flip the switch” and connect our code to the ecosystem.
You’re done! Check out the logs, and notice all the things you messed up!
Deploy an Update
Ok, so it wasn’t perfect. But now it is! . . . Right? Probably not. You’ll want to limit the impact to customers by deploying this as a canary.
- Upload your new code artifact to a new location in S3.
- Update your CloudFormation template. Change reference to the code in your
Function. This triggers CloudFormation to update the $LATEST Version. Add a new
Version, leaving the existing
Versionalone. Reference the new
Versionin the existing
Alias’s routing configuration, with an appropriately small weight set on the canary Version
- Update your existing stack with the new template. You should create a change set and review before executing the change.
- Observe the log streams for your Lambda. You’ll see that some of them are for your canary.
Oh no! It’s still not right! Don’t panic!
- Remove the canary
Versionfrom your template along with its canary weight.
- After the stack has completed updating (should be fast), everything should be as it was before you deployed the canary.
Alright! Everything is fixed up!
- Change the
Versionreference in the Alias to the new
Versionin your template.
- The previous
Versionshould no longer be referenced anywhere in your template, so you can remove that.
- Remove the
Alias, so that all traffic is now going to the new
- Update your stack.
Once you understand how the resources interact with one another, deploying a Lambda Function in a canary style with CloudFormation is a fairly simple matter.
Here is an example template which illustrates a complete canary stack: [here]