Serverless CI/CD: 5 specific ways to level up your pipelines
Bit of Background
I have recently joined a growing fintech startup, and I learned pretty quickly that speed of delivery and reliability come at an absolute premium. In a startup world, a feature that doesn’t make it fast enough or any production issues that arise will cause customers not to use the product.
We had a pretty basic workflow in place, but my instinct as I was onboarding was to invest some time to put together the ideal workflow from a conceptual standpoint and implement it. The reasoning is this would be done once and for all, and all other repos would re-use it. We operate a serverless first approach, all our services are built using AWS Lambda, EventBridge, API Gateway, SQS, Serverless Framework, and so on… The resulting pipeline should reflect that, and itself be serverless oriented.
In this blog post, I share with you some of the things I thought long and hard about and some implementation details that will help you level up your own Serverless CI/CD pipelines!
I. Use GitHub Actions if your repos are hosted on GitHub
The first pipeline I created for my AWS-based serverless applications was using AWS CodeBuild and CodePipeline. They were okay tools, they did the job but struck me as a bit flaky. Obviously, developer experience isn’t placed front and center using the AWS Code Suite, meaning there’s a lot of boilerplate involved to get a pipeline up and running. And of course, there’s a lot of CloudFormation involved with lots of different configurations.
Contrast this with GitHub Actions, on the other hand, check in a YAML file into a .github folder in your root directory and you have a pipeline!
Furthermore, I found that almost anything I wanted to do someone had already done it and it had a really good community around it.
I personally use the Serverless framework for all my serverless apps, and so having the GitHub Action for serverless to just use out of the box is delightful.
Deploying a serverless app is as easy as running these 4 lines of YAML:
I use a little trick where I use the branch name, by using github.ref_name as the input for this action, which makes feature branch deployments a breeze. Bear in mind that you have to make sure that branch names conform to the serverless framework’s stage name regular expression:
^[a-zA-Z0-9-]+$
One last thing on GitHub actions, make sure to use the new AWS OIDC GitHub integration as opposed to long-lived credentials. Not only is it much safer, it actually took me about 15 minutes to set it all up! I found a super helpful blog post on this, make sure to check that out if you’re setting up your GitHub AWS integration.
II. Make use of the best serverless feature: Free when not in use
In other words, deploy and try things out often. If there’s any failure that will occur, you want to know about it the soonest as possible. Typical culprits here are IAM roles, stage dependant variables that might resolve incorrectly in different stages, and possibly pieces of infrastructure that worked in one environment but not the other.
This is where a crucial concept comes into play: Feature-branch deployments. The first part of any effective developer workflow, and especially in a serverless context, is a short feedback loop on whether a newly added feature works or not. The most effective way I have found to do that is two-fold:
- Developers should deploy and test against an isolated stage that represents a dedicated environment for their new feature, for us, it looks something like feature-0000
- A GitHub action (or equivalent) that will listen to pushes against the branch, and then validate the feature branch environment itself
Diving a bit deeper into the automated action, every time code is pushed to any branch that’s not the main branch a sequence of automated testing and automated feature branch deployment to a dev account ( You should of course use different AWS accounts for every environment). And here’s the catch: developers should only be able to merge with the main branch if and only if this action succeeds.
This might look something like this for a TypeScript project:
III. Favour end-to-end tests over both integration and unit tests
I know, I know this is a bit of a hot take. But bear with me a bit.
Now you and I both know that if a feature we deliver doesn’t have tests written against it, it shouldn't be deemed complete. Even more than that, it will almost certainly regress in some way in the future and we won’t have anything to catch that regression.
In a serverless context, a lot of what we do is put together pieces of the AWS puzzle: Whether that’s Lambda fronted with API Gateway or AppSync for Synchronous APIs or Asynchronous processing with EventBridge, SQS, Kinesis, or SNS. So if we want to truly unit test a lambda function, and really test it in isolation (definition of a unit test) we have to mock out everything else. Based on my experience, this takes an awful lot of time to do. Furthermore, the return on investment is pretty minimal as you will still need to write more comprehensive tests, as unit tests will not catch service-related errors like IAM permissions and the like. So be strategic with your unit tests, and think about unit tests for functions or methods that perform complex computations, or anything really that is not AWS related.
Integration tests on the other hand have less mocking in them, as we would use real AWS services to run them. But by definition, integration tests will do just that: test integration between two pieces of your AWS puzzle. Say a Lambda writing an Item to a DynamoDB table, or an AppSync resolver fetching an Item from a DynamoDB table. That presents a much better return on investment than a unit test, and should you have the time, you should almost always opt to write integration tests. But bear in mind, that those will still take time. And if speed of delivery is of the essence, they might not always be practical.
E2E tests on the other hand will give you the confidence you need, confidence in knowing your service actually works as intended. Granted, you will be sacrificing some granularity. With E2E tests, there’s no need to mock anything and you want it to be as close to how it runs in real life as possible. Say you’re building a synchronous rest API with API Gateway, you make a real API request in a testing environment. With asynchronous things, like EventBridge, it’s a bit more tricky. Because of its asynchronous nature, there's no way of knowing when the processing finishes. So typically here I will put an event on an EventBridge and use setTimeout to wait a couple of seconds before running my assertions.
So a typical E2E test might look something like this:
IV. Have a robust workflow for merging that utilizes multiple AWS accounts
Back to less contentious topics, this is as close to unanimous advice as it gets in the AWS serverless world!
When the on push action has succeeded in a dedicated feature environment, which runs in a dev AWS account and has passed a couple reviews from a couple devs on the team, it is ready to be merged into the main branch. When that happens, it will trigger a big sequence of actions.
Let’s go over what they are!
First, you’ll want to use the newly merged code to deploy the feature environment to a dev environment and run all your tests against that. Once this passes, you’ll be ready to progress further. Now at this point, the workflow should deploy to a different AWS account that will have your staging environment. You’ll want to persist that environment, so that if need be, people like Product Owners and other devs can have a dedicated environment that mirrors production at all times. You also run test suites against the staging environment, you’d be surprised by the amount of times things pass in one AWS account but not the other!
Always make sure that staging and prod look and feel exactly the same. That includes data, make sure you don’t have meaningless data in your staging environment, you want it to feel like the real thing.
Now that everything has passed in dev and staging, you’ll be ready to deploy into prod. You should have the confidence you need to know that this deployment will go smooth and won’t cause any issues. And of course, the production workload runs in a separate AWS account. Meaning you will have three accounts (or more, some have dev accounts per dev!):
- A dev account
- A staging account
- A production account
Oh, one last thing here. You’ll want to have a cleanup step at this stage to remove your feature branch deployments, otherwise you’ll start running into all sorts of AWS quota limits like IAM, S3, API Gateway and whole lot more!
This is what this might look like:
V. Bonus Item: the less persistent branches, the better!
This might be another contentious one, but again bear with me.
I have worked in multiple environment where we had multiple persistent branches, usually corresponding to environments where they exist: Dev, Staging and Main. In my experience, that has always caused friction and they tend to very easily drift from each other.
What if we only had one persistent branch at all times: main?
The way this works, is that a developer will create a feature branch from main. From there, have a separate environment where they make their changes, then go through the sequence we talked about above before merging back into main. Once their branch is merged and successfully deploys into all three main environments, it is then removed.
This might seems radical to some, I know I had a little healthy pushback when I suggested the idea to my current team! But once you see it in action, you start wondering why you ever needed other persistent branches to begin with!