Transitioning to AWS CDK

Alex Collier
Version 1
Published in
8 min readMay 11, 2022

It is often said that

It is the journey, not the destination that matters

but to my mind, this expression overlooks the question of where the journey starts. When we talk of a transition similarly, the focus is often on the endpoint. Start typing ‘transitioning’ into a Google search, and matches like ‘… to grey hair’ or ‘… baby to own room’ appear near the top of the suggestions — the starting point is assumed, self-evident or, possibly, irrelevant. So when we talk of transitioning to CDK, where might we suppose we started our journey?

narrow path through dark forest

My own DevOps path had emerged from the shady forests of Bourne Shell scripting and JumpStart deployments on Solaris into the sunny uplands of RedHat Enterprise Linux and its open-source stablemates.

RedHat were actively promoting Ansible at this point and as an avid fan of automation and scripting I set about seeing what it could do. I gradually fell in love with Ansible’s simplicity, power and flexibility, applying it to more and more infrastructure components at the hosting provider where I then worked, a task made easier by the availability of a vast array of technology-specific extensions, found either in the core modules or from community-contributed content.

Want a summary of the port utilisation across network gear from three different vendors? Ansible and jinja2 will see you right.

Want to run a Proof of Concept of a clustered application on VMWare then deploy the application and its pre-requisites in exactly the same way on physical hardware? Ansible roles will do it for you.

It was not until I turned Ansible’s attention in the direction of the Public Cloud (AWS in this case) that I started to see that there may be gaps in its capabilities. It is great at making and managing individual components, but it struggles to cope as the scale and complexity of the managed estate start to rise. The inventory is one of the key facets of Ansible — it tells Ansible what it is managing, how to log into it and what sub-groupings exist within it. When you enter a world where you can launch a thousand virtual instances at a keystroke, control of your inventory becomes crucial, but this is something that Ansible did not manage smoothly. Run a playbook to launch ten EC2 instances, and it will happily do it. Re-run the same playbook with the instance count set to 15 and you get another 15 instances. This is because there is no state management and no recording of the individual unique identifiers (ARNs, UUIDs and so on) of the components which get deployed.

As I stood there ankle-deep in the squelchy bogland at the borders of Ansible and Public Cloud, I spied in the near distance the welcoming landscape of the realm of Terraform. This was to be my next stopping off point and during my 12-month stay, I came to truly appreciate the benefits of a provisioning tool over a configuration management tool when it came to deploying Public Cloud infrastructure (AWS again). With Ansible, you specified a set of states that you wanted to exist and to achieve that state things were created, destroyed or changed in the order specified.

As I alluded to previously, Terraform does provisioning and in that respect sits alongside tools such as CloudFormation and OpenStack. You specify a state that you wish your infrastructure to attain using Terraform’s own declarative language. Terraform then figures out what it needs to change to make that state a reality. It interrogates and stores the state of the infrastructure, compares that with the state defined in the code base, forms a plan of any required changes and then applies them. If there is no solution, or an error occurs, it digs in its heels and no plan is produced. No plan, no changes.

Terraform shares with Ansible the facility of using parameters or variables to define important, recurring values. It is this approach that enables us to guarantee that the same inbound TCP port is consistently open on the Network Access Control List, the Security Group, the host firewall and the web server listener. Terraform is very focused on reusability and the concept of keeping code “DRY” (Don’t Repeat Yourself), and parameters figure largely here also. Code is written as modules and generally forms one repo; different sets of parameters, stored in the separate “live” repo, can then be fed to the same modules to produce separate but similar infrastructure — Dev/QA/Production, or Customer A, B & C for example — all from one code base. Modules can be invoked from a central repository at the Terraform Registry, allowing control of most current Cloud providers. Bespoke modules stored in a version-controlled repository, allow different versions to be deployed to specific environments based on Git tags or Semantic Versioning. Terraform can be paired up with Terragrunt to help with the management of the configuration or “live” code tree and to assist in assembling and running all the required modules in one place.

If you want to do lower-level configuration, such as setting EC2 user-data, then there is a mechanism available, but it is essentially limited to crafting a multi-line string variable with placeholders for your parameters and squirting this into the EC2 deployment configuration. This contrasts with what you can do with Ansible, which has a whole arsenal of operating systems and application configuration options at its disposal.

Now, glittering on the horizon I spy AWS CDK, a land of novelty and promise. Many languages are used in CDK, but they are familiar, everyday languages used by developers the world over. If you are fortunate enough to be strong in one of these languages, that is a big help. I had none of them to any degree and plunged in with TypeScript, a strongly-typed reworking of JavaScript. The joy of using a programming language as the basis for your provisioning is that it is very powerful, but it conversely means that you are limited by any lack in your own knowledge of that language. Many IDEs help out with TypeScript — Atom and VS Code, to name but two, have good error correction and auto-complete functionality.

The CDK approach is similar to Terraform, but you describe the infrastructure you want in the language of your choice using pre-defined building blocks called constructs and all configuration and control have to be done via AWS CloudFormation. A package manifest (package.json) defines which packages are needed for development and deployment and there are mechanisms to go and fetch them automatically from a repository. CDK sits on top of Node.js, which does all the package management using Node Package Manager (npm), so an understanding of that technology and JavaScript generally is useful. Under the covers, your code is transpiled into JavaScript and a plan is constructed that will produce your desired configuration. There are handy sub-commands that allow you to see how your desired state differs from reality and what the CloudFormation change set will look like. As with Terraform, user-data configuration is a concept, but the mechanism is similarly simplistic.

All the heavy lifting around state management is done inside CloudFormation, so as a developer you just need access to the code repo and the NPM library. Once your code is deployed, in a pipeline for example, NPM can go and fetch the required dependencies, so you don’t need to amass a vast code blob in your workspace, as can be the case with Terraform, especially when it has its friend Terragrunt round to play. Unlike Terraform, CDK does not appear to keep track of the unique identifiers of the resources that it deploys. If you change the name of a resource in your code, for example, it does not rename the existing resource, but rather pops out a new resource with the new name, the original resource becoming detached.

Getting started with coding for CDK is pretty straightforward. Running cdk init will lay out a directory structure for your chosen language and will even install a sample application to show you the true path. An application defines one or more stacks; each stack references one or more constructs and the constructs roll out the resources. Built-in classes and constructs can be extended and adapted to make your own, reusable components, helping to keep your code D.R.Y.

Whilst you can create everything in one .ts file, the typical layout, as exemplified in CDK’s sample app, encourages a modular design, with a lib directory containing your resource definitions and a top-level stack definition file that imports the CDK constructs that you need (aws-s3, aws-lambda etc) as well as any of your own sub-components. Self-contained components such as blobs of Lambda or file assets can be stashed in a resources or lambda directory for referencing elsewhere in the codebase. Your application is launched from an “executable” in the bin directory, which assembles application inputs from the environment, the command line arguments and configuration files and launches the application, as transpiled JavaScript, via the top-level stack file.

An interesting feature of CDK is its ability to test itself on whether the code fulfils the brief, i.e. whether the synthesized stack will contain the required components. This functionality leverages the jest JavaScript testing tool, so can be very complex if required. It is set up via the test directory and will typically be used for tests such as verifying that the right number of DynamoDB tables will be deployed, or that specific configuration parameters will be passed to your stack. Running the tests does not require the installation of any external tools, such as might be the case with Terraform and Terratest — it is done simply by running npm run test.

So is CDK worth the upskilling effort involved? If you are competent in one of its configuration languages, it is certainly worth exploring. Even if you are not, AWS are pushing CDK pretty hard since it featured at re:Invent 2020 and TypeScript is now starting to work its way up the popularity charts [PYPL PopularitY of Programming Language index].

If you fancy giving CDK a try, there is a great starter resource at cdkworkshop.com. It may be the thing for you, so drop by and have a wander around. As Lao Tzu puts it:

A journey of a thousand miles starts with a single step

https://quoteinvestigator.com/2012/08/31/life-journey/

About the Author:
Alex is an AWS DevOps Engineer here at Version 1.

--

--