We could also title this post: “How to order a pizza for the team after a successful release with CloudFormation.” This post is about why it now no longer matters to my sanity and my projects that CloudFormation is sometimes lacking functionality for programming constructs that advanced “infrastructure as code” (IaC from here on out) paradigms would require. I used to get frustrated that Terraform, CloudFormation, or all the other various CDKs/SDKs were running behind on features that I needed or wished they had: logic constructs like loops, integration with new services, integration with external services, support for non-IaC activities, etc.
Enter CloudFormation custom resources, my new best friend in my IaC adventures.
What are CloudFormation custom resources?
CloudFormation custom resources are a way that you can extend AWS CloudFormation functionality beyond the out-of-the-box feature set, allowing you to do things normally reserved for superheroes. They are a way to describe custom services and logic that actually gets executed during deployment of resources in a CloudFormation template.
These custom resources are resources described in snippets of JSON or YAML in your templates and work by firing off a custom webhook during the deployment of your CloudFormation template. A counterpart of your custom resource will be the actual event handler (Lambda or SNS) that acts as the webhook during deployment to run your custom logic. The handler is the actual magic and the way custom resources are built.
Because you are free to customize the actual handler logic (the libraries it loads, the SDKs it calls, the commands it executes) and the properties (in the actual CloudFormation template) it receives, there is significant power and extensibility inherent in what you can achieve in CloudFormation custom resources.
Why would custom resources be needed?
Most of the time that I see teams using custom resources in CloudFormation, it’s because there is a gap in functionality related to provisioning resources not supported by CloudFormation yet, provisioning non-AWS resources in CloudFormation, or performing provisioning steps not related to infrastructure that might be more embedded in DevOps cycles to short-circuit feedback loops or business processes.
Provisioning AWS resources that are not supported by CloudFormation
CloudFormation is actually very solid. It’s more feature reach than many other IaC tools and it’s rare that I find something actually missing with respect to AWS services supported by it. However, there are still gaps like my recent discovery that it still doesn’t support deploying IAM Service Control Policies in an AWS Organization.
Custom resources make it easy to add in support for missing resources and gaps of functionality, allowing you to maintain IaC, even where AWS doesn’t allow it. We’ll walk through the SCP approach in my real world example down further.
Provisioning non-AWS resources with CloudFormation
Another reason that I frequently reach to custom resources is to provision non-AWS services through my IaC processes. More than any other cloud provider, AWS typically isn’t lacking in providing some native service needed by my projects but it does happen from time to time.
By having the option of plugging non-AWS services in to my IaC pipelines, it pushes CloudFormation a little closer to Terraform, Pulumi, and Stackery. Like Terraform, you can provision resources across providers in a multi-cloud strategy and environment yet still retain the service-based and stack-based nature of CloudFormation.
Performing provisioning steps not related to infrastructure
Another way of utilizing custom resources is a more recent use case for my projects. I’m finding more and more that provisioning steps need to be followed that aren’t strictly infrastructure-related.
A few recent examples of this are listed below, but not comprehensive:
- Running relational DB migration scripts after a successful deployment of Amazon Aurora or RedShift clusters
- Executing smoke tests as the final step in a deployment of new infrastructure
- Notifying a “red team” in the event of a new deployment needing to be validated by humans
- Firing off a new API call to incident management or ITIL tools like ServiceNow, Jira, etc for some policy or procedural reason
With custom resources, a Lambda function would be written in the language of your choice, deployed, and triggered to perform any of the above logic you care to embrace and keep it all managed in CloudFormation.
Really, the sky is the limit but be careful. With such extensibility and freedom, it’s easy to start thinking too broad and baking scope that doesn’t make sense beyond that it’s plain fun.
Real world: a working example to model future experimentation on
All of this is nice theory and information, but this kind of stuff doesn’t tend to help me if I can’t apply it to a real-world example in my day job or some consulting gig. So now that we know what custom resources are and when you might use them, let’s see how to actually use custom resources.
My experience and knowledge really formed around a brief disaster with AWS Control Tower. At work, I had a need that Control Tower seemed to satisfy so embarked on a week-long hype-love-frustration-hate-resolve kind of relationship with the service.
Design thinking led me to the most simple “why” of what I was trying to solve: design a secure, repeatable, simple, framework to protect multi-account AWS environments, using best practices, by enforcing guardrails to stop insecure activities, combined with smart aggressors to watch for artifacts that may compromise the environments.
I won’t go too far in to that, but will say that it didn’t meet my needs and led me down the path of instead opting for AWS Organizations combined with a simple suite of Lambda functions and IAM Service Control Policies. Service Control Policies (aka SCPs) are similar to IAM policies but are applied by a parent AWS Account to a child AWS account via AWS Organizations. They can white or blacklist services so not even the Root Account or a full IAM Administrator in the account can call the specified API actions.
My pivot from AWS Control Tower led me to CloudFormation custom resources through fast experimentation and even faster failures. We’ll use the solution I landed on with AWS Organizations and IAM Service Control Policies for the rest of the post. To begin, I found there are a few things we need to do to use a CloudFormation custom resource:
- We’re first going to need to write the logic for your custom resource in the Lambda function that will be your handler.
- Then we’re going to make our custom resource logic available by deploying to an AWS Lambda function or by subscribing to an SNS topic.
- We’ll decide what properties need to sent to our handler via our CloudFormation template.
- And finally we’ll use the custom resource and properties defined above in our CloudFormation template, reference the Lambda function or SNS topic to link CloudFormation to the actual Lambda function so that our logic gets triggered during a stack deploy.
Real world: how did we use custom resources?
To use a custom resource in a CloudFormation stack, in our template somewhere we need to create a resource of either type
Custom::<YourName>. It’s much more common in GitHub examples and other blog posts to see the latter approach, so just stick with that syntax.
This syntax is shown in a snippet of my real-world example whose purpose is to deploy an IAM Service Control Policy to an AWS Organization that prevents/restricts the root user in an AWS account from taking any action, either directly as a command or through the console.
Notice that the resource type is
Custom::ServiceControlPolicy, which is not a resource type provided natively by CloudFormation.
Also, note the
SCPPolicyLambdaFunc reference in the ServiceToken property. As inputs to your custom resource, you must provide a
ServiceToken property. The ServiceToken is an ARN of either an AWS Lambda function or an SNS Topic that will receive your custom resource request.
You may also include additional properties to send into your custom resource for configuration. In the example above, the
PolicyContents properties are all custom properties that only our handler function cares about. These properties get sent to our handler function with the values we assign to them.
Real world: what does the handler function look like?
Most of the tricky bits around custom resources is in actually writing the handler. There are a few “gotchas” which can leave your CloudFormation stack in a bad state, but we’re not going to go in to too much detail here as this post isn’t about how to write a full handler. In the
ServiceToken property above, we’re referencing the Lambda function that acts our handler to process the properties we’re sending it. You can deploy the Lambda function in a separate stack or include the code in the CloudFormation template. For simplicity, we’re going to include it in the template, but I don’t typically take that approach for production code.
Complexities around custom resources and the associated handler are because custom resources get executed using asynchronous, callback programming models. Asynchronous programming means that you’ll need to understand the various failure modes and wait-states.
This means that when your custom resource is kicked off during a CloudFormation deployment, the service won’t hang out and wait for a synchronous response; it will move on. This makes it much easier and faster for CloudFormation to provision many resources in your stack in parallel, but it also adds complexity. It means that not only do you need to provide the correct response that CloudFormation will expect, but it also means you need to prepare for scenarios where your handler function hangs. This is why in the handler code above you see this:
# Setup alarm for remaining runtime minus a second
signal.alarm(int(context.get_remaining_time_in_millis() / 1000) - 1)signal.signal(signal.SIGALRM, timeout_handler)
This snippet basically arms our handler with the ability to send a terminate signal to CloudFormation so that it doesn’t sit around waiting for an hour before timing out.
Real world: event types our handler will need to survive
CloudFormation custom resources are going to need to handle several types of events so they don’t hang and can react to CloudFormation deploy cycles. In your Lambda handler, be sure to include logic that can handle the following events:
- Create: A Create event is invoked whenever a resource is being provisioned for the first time, either because a new stack is being deployed or because it was added to an existing stack.
- Update: An Update event is invoked when the custom resource itself has a property that has changed as part of a CloudFormation deploy.
- Delete: A Delete event is invoked when the custom resource is being deleted, either because it was removed from the template as part of a deploy or because the entire stack is being removed.
Real world: our final working solution
You may be in a hurry and looking for the TL;DR approach to how you’d solve for a custom resource with the handler function code. I’m good with this, even have empathy for it, so am sharing this with you so you can accelerate. Make it an important goal for yourself to fully understand what’s going on here. If you do it right, CloudFormation custom resources can become your Swiss Army Knife for complex IaC provisioning.
CloudFormation custom resources are awesome for filling gaps in the AWS ecosystem or for bringing third-party resources under the CloudFormation umbrella. They’ve become a staple of my own IaC deployments when their are gaps discovered in CloudFormation for needed functionality or 3rd party services that I want to keep close to my IaC.
In this post, we learned what custom resources are and when you would want to use them. Then, we learned about the workflow for creating and using CloudFormation custom resources, as well as some tips and tricks. I hope you can find some benefit from the post and continue on the mantra of being a #PerpetualLearner.