Infrastructure as Code: A case for Higher Level APIs
Having your infrastructure as part of your code is absolutely essential for keeping projects maintainable, but it can be a bit tedious. With the rise of higher level APIs, doing so can be fun again.
When talking about Infrastructure as Code, HashiCorps Terraform has long been the prevalent choice. This is not for nothing, it offers solid integration for various cloud providers and helps you specify exactly what you want. But this can be a blessing and a curse: Giving you full control also means you have to be very exhaustive in writing your code.
Higher level APIs like AWS CDK (Cloud Development Kit) provide another approach, offering a more opinionated approach that hides complexity from you. And this is probably the complexity you don’t want to see at all.
What is CDK and how does it differ from Terraform?
CDK and Terraform both offer a declarative approach for specifying your application’s resources. While Terraform has a completely own DSL, CDK uses Typescript as the specification language. For developers already using TS to write their application, this means less context switching.
Every resource you want to create is declared as a typescript object in your script. Its properties can be accessed and used by other resources afterwards. All these resources will be part of a stack and usually stacks do not share any resources. Of course you can still access resources outside of your stack via their ARN or other identifiers.
Writing the code in CDK might feel a bit more imperative at first, but what you’re writing is actually not a script that gets executed one after another during deployment. Instead, the output of this script is an AWS Cloud Formation template that will then be applied in your environment. These are just as declarative as Terraform scripts and using CF offers some additional benefits when you want to access your resources in one place.
Size does matter
To illustrate how using a higher level API like CDK can simplify your work, let’s look at an actual example that I recently came across on a project. The goal was to deploy an AWS Lambda that writes resized images to an S3 bucket. What we need for that is the lambda, the bucket, and a role that allows the former to write into the latter.
In Terraform, creating those looks like this:
Meanwhile, the CDK variant is a “bit” shorter:
As you can see, most of the difference lies in how the lambda is granted access to the bucket. Manually creating policies and roles is not a big thing when using CDK at all, as it is mostly hidden from you.
When I first saw this, I was really enthusiastic. Maintaining roles like in the TF example above is exhausting and error-prone after all. Getting it for free seemed really appealing to me.
Giving up control
Highly motivated to write less code, I told a colleague about this and was pretty taken aback by their reluctance to share my enthusiasm. The first question was, now that role and policy creation are implicitly done by the grantReadWrite function, who decides what those roles are called? Won’t your IAM be a mess after all, cluttered with roles and policies that nobody can maintain or match to the respective apps?
The short answer is: No. When digging into the elements that are actually created, you can see that only one role is created and visible in IAM: The ResizeServiceRole that bundles all permissions of the resize lambda (actually called something like ResizeServiceRoleABC1234 to avoid collisions). This role includes the ResizedBucketPolicy and anything else the lambda function needs, like access to DynamoDB or the like.
Also, CDK only assigns those permissions that are necessary for the operation you specified. Never again will any wildcard-permissions accidentally expose more than you actually wanted.
Furthermore, as CDK is using CloudFormation Templates, you do have a place to access all the things that are created in one place.
This is actually one of Terraform’s weaknesses: As each element is created independently, you can only find them by accessing all the different services in the AWS console and searching for them by name.
On top of that, CDK (or rather CloudFormation) is really good at cleaning up after itself. If you delete a stack, all elements are thoroughly deleted and no orphaned elements remain, so having IAM roles float around that don’t belong to any lambda is not a real risk. Of course, TF also cleans up everything it created when using the destroy command. Be sure to use it whenever you want to remove a Terraform-deployed application, as cleaning up by hand will hardly be possible without forgetting something.
Tagging is key
Regardless of whether you use Terraform or CDK, having labels in place that help you identify what application an object belongs to. For this purpose, AWS offers tags. Without tagging, your cloud can quickly become hard to maintain.
Thanks to CloudFormation, you only have to set the tags once for your stack and all the created elements will become tagged. Setting the environment name or an application ID is absolutely vital and a great help when trying to match objects to a stack.
As can be seen in the example on top, Terraform supports tagging as well. As there is no grouping concept like a CDK stack, you will have to tag each element individually. This may feel a bit excessive at first, but it will pay off in the long run.
The best is yet to come
When working in the AWS Cloud, CDK already is a great solution for maintaining your infrastructure, but it is continuously improved and gets better and better. Terraform still has the advantage that it can be used with various cloud providers and is not proprietary to one.
But more alternatives for higher level APIs are currently on the rise. I would recommend having a closer look on Pulumi, which offers an approach that is basically very similar to CDK, but can be used for multiple cloud services.
Using higher level APIs can improve the readability of your code by reducing complexity, while at the same time improving the output, since a lot of thought has been put into it.
The lambda & bucket example is a very simple one, but I’ve already successfully used it in big projects and can fully recommend it.
When you’re starting your next project, be sure to give it a try. It noticeably improved my developer life, and I’m sure it can improve yours as well.
Do you already have experiences with Higher Level APIs or are you still hesitating to use them? Let me know what you think in the comments or on Twitter, I’m curious about your thoughts.