AWS Lambda Deployment using Terraform
Recently a question got asked on Hacker News that grabbed my attention.
I wanted to know how others’ experience has been. At ACL, we have multiple Lambda functions that we heavily rely on, and we use Terraform to deploy them. It’s been working well for us, so I was curious to know how others have approached Lambda. However, it seemed that others were struggling:
But, Terraform modules can abstract that away…
But, Terraform updates only the necessary changes…
But, if you use Terraform, it’d be like developing any other piece of infrastructure…
But, you can already support multiple environments using Terraform…
In short, I was left wondering “why don’t people simply use Terraform?” That’s why I’m writing this post. I want to share with you how you can overcome almost all of the concerns raised in Hacker News when you use Terraform in its full capacity.
Let’s get started.
Don’t be Fooled
You may be thinking that all it takes to deploy Lambda with Terraform is to:
- Apply Terraform.
Those four steps would technically deploy Lambda, yes. But, by those standards, we’d also be “deploying” if we SSH’d onto servers and copy-and-pasted our source code onto it.
It’s not enough to just create an AWS Lambda resource. There’s much more to it than that.
The Misunderstanding around AWS Lambda
If you’ve ever watched an AWS Lambda related presentation, you’ve probably noticed that the presenter quickly glosses over the diligence required around Lambda because of it being “just a function”. That’s unfortunately a gross simplification of Lambda and the architecture underlying it. Just because a codebase is small, does not mean the codebase does not require the same diligence as other codebases.
Necessary Components of an AWS Lambda Function
I’d argue that, even with a Lambda function, you still need:
- Testing — unit-tests and integration tests are still required to validate your logic and prevent regression.
- 3rd-Party Packages — unless you are building something truly basic, you’d often need a 3rd-party library to help you out. For example, internally we have a Lambda function that parses CIDR blocks. Should I write the logic to parse CIDR blocks myself, or should I just use ipaddress.js?
- Module Bundler — now that you have 3rd-party packages for testing and for your runtime, you need something to help you package your codebase for production. You need a Module Bundler, such as Gulp or Webpack for Node.js.
- Supporting AWS Resources — seldom do Lambda functions operate in isolation. They’ll at a minimum require an IAM Role and IAM Policy. Often times, they’ll need supporting resources such as KMS, CloudWatch Logs, etc. that you cannot get with e.g. AWS SAM or Serverless framework alone.
- Multiple Environments — so you can verify your code in Integration and Staging before you deploy to Production. You’ll also want to have a sufficiently capable local development environment for rapid development and testing.
- Monitoring — so you can monitor the health of your Lambda functions and whether they are either failing or timing out.
You should have a mechanism for all of those concerns before you go to production with Lambda. Fortunately, this is where Terraform shines.
Let’s now break down each point and see how you can address them using Terraform.
Development and Testing
Make sure your Lambda function is easy to test and develop.
Having a development and a testing environment is not strictly related to Terraform, but I feel compelled to mention this point anyway. Why? Because it is these two environments that are central to developers having confidence in their code before deployment. And if you don’t have confidence in your code functioning properly, then you shouldn’t be planning to deploy your code yet! You need these two environments for a rapid and familiar development experience.
Internally we use a Node.js Lambda function that monitors CloudTrail for security concerning events. Despite its small codebase, this Lambda function has 39 tests and a JS linter to ensure we can develop it quickly and with quality. It uses popular 3rd-party Node.js packages for testing (Gulp, Mocha, Chai, Sinon) and for runtime (Lodash, ipaddrs.js). That’s how we can maintain a familiar Node.js development and testing experience despite it being for the unfamiliar production environment of Lambda.
The core philosophy behind these decisions is to not restrict our application’s architecture to AWS Lambda.
AWS Lambda should be viewed only as one entry point into your application. Your application logic should remain distinct and separate from AWS Lambda itself.
This philosophy gives us the flexibility to leave AWS Lambda if required. It also lets us think of our Lambda function like any other application we’ve developed.
In the example above, we’ve abstracted Lambda away from our application by simply having it immediately call an object that we’ve created and is representative of our domain. With this abstraction, we can now test our application, starting at the
Monitor class, like any other Node.js application.
This is how we’ve kept the development experience pleasant and how we’ve prevented ourselves from being locked-in to AWS Lambda. If ever required, we can put Express.js in front of our logic and run it in a Docker container, using something like AWS’s EC2 Container Service (ECS).
Make sure it’s easy to create everything your Lambda function needs.
I’ll be honest with you. I don’t understand why I would use AWS’s Serverless Application Model (SAM) to deploy a Lambda function. At its core, AWS SAM is a specification that…
…supports AWS resource types that simplify how to express functions, APIs, mappings, and DynamoDB tables for serverless applications.
However, from my perspective, it can’t create the variety of AWS resources that I often need, which Terraform can.
Supported by SAM
AWS Lambda, Dynamo DB, API Gateway
Not Supported by SAM
IAM Role, IAM Policy, ACM Certificate for HTTPS API Gateway, Cloudwatch Events Scheduler, KMS, Route 53 records for CNAME, etc.
In fact, our CloudTrail security service Lambda function needs: a CloudTrail trail, a KMS key to encrypt CloudTrail, S3 buckets for CloudTrail, a CloudWatch Log group, and Lambda integration with CloudWatch Logs.
You get all of that with Terraform out of the box. If I were to deploy with SAM, I’d still need a separate infrastructure deployment tool like Cloudformation or Terraform. So why not simply use Terraform from the beginning?
You still need separate environments for your Lambda functions.
Let me tell you that AWS’s recommendation of splitting your Lambda function, within a single account, for multiple environments, is not an ACL recommendation.
In fact, it’s a best practice to have “Logical Account Sharding” with your AWS accounts. In other words, you should have separate AWS accounts for your environments — read more here. Consequently, that means you should also be deploying your Lambda function into their respective accounts; not into a single account. Here again Terraform comes to the rescue.
With this feature, you can easily deploy your Lambda function (plus all required AWS resources) into Integration and Staging accounts before applying to Production. That gives you a much safer way to promote your code. It’s also much closer to the workflow you’d have (or should have) with your infrastructure.
Even a simple Slack bot will have a sensitive token you need to keep secure.
Lambda supports encrypted environment variables out of the box. But… that means little. Let me rephrase that. Lambda supports encrypted environment variables at rest out of the box. It doesn’t decrypt the environment variable for your application at runtime. You need to do that yourself. So what can we do if our Lambda functions have secrets? Well, you have two options:
- KMS Encrypt/Decrypt — you can encrypt your secrets using KMS into CipherBlobs and either pass them in as environment variables or store them as files in your codebase. You can then use the AWS SDK during runtime to decrypt them.
- Parameter Store — you can store your secrets in EC2 SSM’s Parameter Store and use the AWS SDK to fetch them at runtime.
Nowadays, I tend to recommend looking at Parameter Store since it’s the simpler option of the two. You can easily use AWS’s Console UI to add your secret.
Then you can easily fetch your secrets using the AWS SDK.
Here, Terraform can help if there are any necessary KMS configurations required. For example, if you want to create a custom key, you can define it in Terraform. If you want to fetch parameters beforehand and pass them in via environment variables, that’s easy to do too. Terraform gives you the flexibility.
Unfortunately, secrets bring their own set of challenges for Lambda. A decryption call has an overhead of ~150–200ms (at least from my own benchmarks).
So how can we then keep our Lambda functions speedy if we require secrets on each invocation? Well, that requires us to peak under the hood of AWS Lambda.
AWS Lambda Under the Hood
Surprise! It’s a pool of containers.
If you’ve considered using Lambda, you’ve probably also been curious to understand how AWS Lambda works underneath as well. Lambda works by running your functions within containers. That’s how it is able to spin up quickly, and scale out broadly.
On the first invocation, Lambda spins up a container with your application code in it and invokes your handler function (aka. cold invocation). However, on the second invocation (if it happens within a short time period), the same container is reused (aka. warm invocation). When it is reused, your code base and global variables are also reused. This reusability gives us the option to optimize our code execution by performing setup logic only once.
Knowing that global variables can be reused between invocation means we can decrypt our secrets once and store them in a global variable for future invocations to reuse. This helps us overcome the ~150ms decryption time. It also helps us overcome the “boot up” time required for packages to be loaded and the run-time to be optimized.
Now, let’s explain this important detail within the context of an example. If you’ve built a Slack command using AWS Lambda, you’ve noticed that you must reply within 3 seconds, otherwise Slack treats it as a timeout event. In our case, with an infrequently called company-wide Slack command, if the first time invocation takes ~2–3 seconds, then we risk a lot of employees getting timeout events. So what can we do to make the first call a fast call?
Once again, Terraform comes to the rescue.
To keep Lambda warm, you need to keep invoking it. A simple trick you can use in AWS is to use a CloudWatch Events scheduler to regularly call your Lambda function every minute to keep it warm.
Without Terraform (or CloudFormation), you do not have such options.
Is it even working?
Lastly, now that you’ve got your Lambda function up and running, how can we monitor its health and keep it healthy? There are a couple of options available, although none are unique to Lambda. Fortunately, some of these options are, yet again, easy to implement with the help of Terraform.
Using Terraform, you can create CloudWatch Metrics that monitor your Lambda failure invocations or timeouts and have them trigger an SNS event for you to monitor. This strategy has the benefit of simply being more Terraform code alongside your Lambda deployment.
If you want a more human-friendly way to read your Lambda logs, you can use the tool awslogs. It significantly improves the UX of using CloudWatch Logs, and has the added benefit of it being used directly from the command-line.
However, awslogs has its limits. If you want an even more user-friendly tool, you can create an AWS ElasticSearch cluster and pipe your CloudWatch Logs into it for processing. You then have the Kibana interface to quickly look through your logs and even do basic log analytics.
You can continue to use third-party tools such as New Relic, Airbrake, Rollbar, etc. for exception management. In this case, similar to any other application, you just need to include the respective 3rd-party package and configure your Lambda function to use it. Alternatively, you can simply use logs and metrics monitoring if you can accept that experience.
If you are a Datadog or New Relic user, then monitoring the performance of your Lambda function will not be too different. You simply have to set it up once. And guess what? You can even setup Datadog and New Relic monitoring via Terraform itself.
There you have it. A complete picture of what it takes to deploy your Lambda function to production with the same diligence you apply to any other codebase using Terraform.
I hope I’ve made the point clear that there are many cases where frameworks such as SAM or Serverless are not enough. You need more than that for a highly integrated Lambda function. In such cases, it’s easier to simply use Terraform. At ACL, we have our entire infrastructure codified in Terraform, so this was a natural step for us.
I hope this detailed post gives you a complete and viable approach for deploying Lambda effectively. Personally, I find it rare to find thought leadership around the entire development experience of AWS Lambda. People are individually figuring out a lot of details for themselves. Hopefully this post prevents you from having to do some of that leg work yourself and gets you up and running faster.
Please share with me your own experiences or alternative techniques. I’d love to know about them!