Safely Deploy AWS Lambdas with CloudFormation
by Kendra Elford
First, A Little Bit of Background
For some time now, my team has been well versed in building JVM-based services that run in EC2. We adopted continuous delivery years ago and are now comfortable deploying our software in a canary style, which has saved our bacon on several occasions. Recently, we’ve begun to dip our toes into newer AWS technologies, including Lambda and Amazon SQS FIFO (First-In-First-Out) queues. We have a long history of using SNS (Simple Notification Service) to deliver events to our various pipelines, but Amazon SNS lacks the ability for a subscription to write a “group id” for FIFOs. We felt this was a good place for us to start experimenting with Lambdas. We ended up building a re-usable AWS Lambda component to take care of these subscriptions for us that can be deployed using our existing deployment infrastructure.
A piece of the puzzle was still missing for us was: How were we going to address canary deployments? Here, I’ll share how we solved this.
Canaries for Lambda Functions?
If you’ve been practicing continuous delivery for your AWS EC2 based services, you’re probably already deploying changes as canaries. In order to limit potential impact to your customers, you likely send a portion of your traffic to new code. This same process can be applied to Lambda Functions.
They are based on two abstractions over the Function
— Version
and Alias
. As presented in the AWS console, these can be a little confusing. Both appear under the “Qualifiers” menu.
A Version
is basically exactly what it sounds like: a historical record of a code artifact. By default, whenever you upload new code to a function, it’s applied to a special Version
called “$LATEST”. A new Version
can be created by selecting the $LATEST version from Qualifiers, then selecting “Publish new version” from the “Actions” menu.
The Alias
abstraction allows you to create a name that refers to one or two Version
s. There is a default Alias
called “Unqualified,” that refers to the $LATEST Version
. When you select two Versions
for an Alias
to use, you can also now specify how much traffic the second one will receive from its event source (e.g.: Amazon SNS, Amazon Kinesis, API Gateway, etc.).
An important aspect to understand about Versions
and Aliases
is that any of them can be configured to receive events, and they all operate independently from each other. For our use case, we create a single Alias
to use for our production canary and only configure to receive events. Make sure that none of your Versions
, nor the Unqualified Alias
, are configured to receive events.
Once you get to the point of understanding how Versions
and Aliases
work, setting up a canary in the console is pretty self-evident:
- Select your production canary
Alias
- Pick the stable
Version
in the first box - Pick the canary
Version
in the second box - Select the traffic distribution.
- Click Save
- Check it out in the logs! You’ll see the
Version
in the log stream name, in the square brackets.
Automate this Business!
On our team the AWS automation tool of choice is AWS CloudFormation. This allows you to declare resources and their configurations however you want and lets AWS figure out how to make that happen.
To make our usage of CloudFormation repeatable, we use Python and Troposphere with our deployment framework and reusable recipes. We have a re-usable Lambda Function for connecting an SNS topic to a SQS FIFO and a recipe for deploying, configuring and canarying it.
CloudFormation has three types of resources for deploying Lambda Functions in a canary style:
Function
— This is where the “code” for your function is maintained. What “code” means in this case depends on the runtime type for your function. In the case of Java, the “code” is a reference to a .jar file in S3, that contains all of the assembled .class files and resources to execute in the JVM. When this resource is created for the first time, or updated with new code any subsequent time, the “$LATEST”Version
is automatically updated, like in the UI.Version
— A version can be created or removed, but not updated. Whenever it’s created, it becomes a “copy” of whatever “$LATEST” is currently. CloudFormation is smart enough to know that an update to theFunction
comes first. The version also allows you to specify a base64 encoded sha256 signature of the expected function code, so you can’t accidentally create a version of something you didn’t expect (e.g. create a newVersion
without updating theFunction
or upload the wrong artifact to theFunction
, etc.). CloudFormation will automatically create incrementingVersion
numbers for you.Alias
— TheAlias
is an updatableResource
with a name, aVersion
and aRoutingConfiguration,
which may contain aVersionWeight
that describes how much traffic is routed to your canary.
Here’s a diagram that describes how all of these Resources
interact with one another:
Putting This All Together, In Practice
So, now that you understand what CloudFormation Resources to use and how they will behave, what does the process look like for working with Lambdas?
Deploy the Initial Version
So, you have your code ready to go.
- Upload your code artifact to S3 so that it can be referenced by your Function in your CloudFormation template. Make sure you capture the SHA256!
- Create your CloudFormation template with a
Function
,Version
andAlias
. Make sure to name yourAlias
something obvious, like “production-canary.” TheAlias
is only going to reference the singleVersion
that you’ve created. - Create your CloudFormation stack with that template, and wait for all of the resources to be created.
- Use your method of choice to subscribe your Lambda Function’s “canary” Alias to your event source. We typically use the console for this step to “flip the switch” and connect our code to the ecosystem.
You’re done! Check out the logs, and notice all the things you messed up!
Deploy an Update
Ok, so it wasn’t perfect. But now it is! . . . Right? Probably not. You’ll want to limit the impact to customers by deploying this as a canary.
- Upload your new code artifact to a new location in S3.
- Update your CloudFormation template. Change reference to the code in your
Function
. This triggers CloudFormation to update the $LATEST Version. Add a newVersion
, leaving the existingVersion
alone. Reference the newVersion
in the existingAlias
’s routing configuration, with an appropriately small weight set on the canary Version - Update your existing stack with the new template. You should create a change set and review before executing the change.
- Observe the log streams for your Lambda. You’ll see that some of them are for your canary.
All Done?
Oh no! It’s still not right! Don’t panic!
- Remove the canary
Version
from your template along with its canary weight. - Re-deploy.
- After the stack has completed updating (should be fast), everything should be as it was before you deployed the canary.
Alright! Everything is fixed up!
- Change the
Version
reference in the Alias to the newVersion
in your template. - The previous
Version
should no longer be referenced anywhere in your template, so you can remove that. - Remove the
RoutingConfiguration
from yourAlias
, so that all traffic is now going to the newVersion
. - Update your stack.
Once you understand how the resources interact with one another, deploying a Lambda Function in a canary style with CloudFormation is a fairly simple matter.
Here is an example template which illustrates a complete canary stack: [here]