Complete End-to-end Guide for Developing Dockerized Lambda in Typescript, Terraform and SAM CLI
This is a full guide to setup, develop and deploy a backend app with a recently released container image feature for lambda on AWS.
Needless to say, if you are a great fan of Docker, you would know how amazing it is. What you test on local is what you get when you deploy it, at least at the container level.
Since this feature is quite new, there have been lots of rabbit holes I fell into, and I’m sure others will too, so I will break every piece of rabbit hole down so that nobody gets trapped into it. This guide starts from the real basics like making a user or setting up terraform, so feel free to skip to the section you need.
Reason to use Terraform and SAM CLI together
Well, it seems that Terraform supports building a Docker image and deploying it to ECR out of the box, but after lots of digging, I noticed that things would get simpler if I just build docker image in another pipeline and deploy it with a few lines of shell script. So Terraform will used to define resources excluding the build and deployment process. There’s no problem with that.
And, what SAM CLI? Terraform cannot replace SAM CLI and vice versa. SAM cli is useful in developing local lambdas because it automatically configures endpoints for each lambda and greatly removes barriers to the initial setup. Since lambda functions are ‘special’ in the way that they only get ‘booted up and called’ when they are invoked (unlike EC2 or Fargate), just doing ts-node my-lambda.ts
would not make it work. Of course there are many other solutions to this matter (such as sls
) but in this guide I will just use SAM CLI. But for many reasons SAM makes me want to use other better solutions if any... The reason follows right below.
Disclaimer for the people who are looking for how to ‘hot-reload’ docker container for typescript or javascript based lambda: it won’t work smoothly as of now. The best bet is to use nodemon
to watch a certain directory to trigger sam build
every single time, and in another shell launch sam local start-api
. It works as expected, but the current problem I see from here is that every single time it sam build
s, it would make another Docker image and another and so on, so there will be many useless dangling images stacked up in your drive, which you will need to delete manually because SAM CLI does not support passing in a argument that's equivalent to docker run --rm
. Anyways that's the story, so this is the reason I might want to try some other solutions. More on this on the relevant issue on Github. Please let me know if any of you had a good experience with sls
because I haven't used it much yet.
Ok. Now let’s write some code.
Setup AWS for Terraform
First, make sure that you’ve installed and authorized on your AWS CLI. Installing AWS CLI is kind of out of scope here, so please follow the guide on AWS.
After successful installation, run:
You will be prompted to enter Access Key ID and Secret Access Key. Depending on your situation, there are different ways of how you can handle this, but for the sake of simplicity we can make one user (from AWS console. You will probably only use it for ‘Programmatic access’) that would have these policies for applying Terraform codes.
This one for setting S3 bucket as a backend:
And this one for locking the state:
And the next one is quite tricky; because we will temporarily enable permissions related to managing IAM because we will first need to make a role from which we could assumeRole
whenever we try to plan and apply our IaC.
For now we can just go onto AWS console and make this policy:
Make sure you will need to narrow down to specific actions and resources used after everything is done (for production usage).
Now, now that you’ve made three distinct policies (or all in one, depending on your preferences), attach them to the user that you’ve just crated for running aws configure
.
Setup Terraform
If you haven’t already, install terraform by following an instruction from the official website. Just download the binary and move it to the bin
folder.
Now verify version of terraform
And then make main.tf
file in your project directory (I personally put it into IaC
folder because there will another folder for the 'real' .ts
codes for the backend):
Now, run terraform init
:
Then, we will need to add s3 backend and state locking. But before then, make a table on Dynamodb and also a bucket on S3, each for hosting IaC backend and locking the state.
Now we will need to add more to the policy on DynamoDB we created because we want to create a table:
Then you could write this code (by the way, it may be a good idea to put this below IaC in a different general-purpose repository because the current repository is meant to be only used for lambda-related resouces. But for the sake of this article I will just write it away here):
And then, also add S3 backend (you will need to add relevant IAM policies too here, but since we know how to do it, I will cut the explanation):
Now, run terraform apply
, verify the changes, and enter yes
. DynamoDB table and S3 Bucket should have been created. Here's the code so far:
Now, add s3 backend and state lock:
We are also going to use Docker provider, so add that too:
Now because you’ve added a backend and another provider, we will need to run terraform init
again, and then terraform apply
. Run it.
Setting up lambda
Now we will need to develop lambda on the local machine. Install SAM CLI:
Note that the outdated versions would not support running Docker containers, so make sure that your version is the latest.
Now, we won’t run sam --init
, because it will make it difficult to make the server into a monorepo structure. The reason that we will want to make it into a monorepo is that it will make it much easier to propery dockerize every single lambda and deploy it with dependencies that each of them only require to have. Instead we will use lerna to initialize the server folder.
And as usual:
Then it will give you this layout:
Then, add your first function package. For the sake of this example, let’s assume that we want to make a REST API, composed of many lambdas, each returning ‘hello’ as a response in different languages (which is totally useless in reality, but at least useful here). Our first lambda will be an English one.
Now the directory structure will look like this:
Now, under server
, we will need to add some utils to build and invoke the function locally. Add modify the server/package.json
as follows and of course, run npm i
again:
To add some explanation to what we are trying to do: these devDependencies
are going to be package-wide dependencies. These are not specific to any one of the functions that we are going to build; They will help in tooling general stuffs. That's what we put them here.
Dependencies:
@types/node
: we will need this to give proper type definitions for 'built-in node' modules likefs
orpath
.concurrently
: just a script runner.lerna
: you know it.nodemon
: this will help us watch a directory and build Docker image again.
Scripts:
Now, you need to create template.yml
for SAM cli to consume and run what we want to run.
We won’t be able to run sam build
or sam local start-api
yet, because we still need to setup Dockerfile
and ECR repository.
So far we have added template.yml
for running SAM CLI:
Now we will add Dockerfile
in packages/hello/
.
This will be the content for Dockerfile:
To go through it line by line:
amazon/aws-lambda-nodejs:14
is the official amazon image for lambda. Current LTS of nodejs is 14, so we are using this.AS builder
is related to multi-stage builds in Docker; it helps reduce the final Docker image size. Basically in thisbuilder
stage, we will only build the output to be included in the final image, and any dependencies installed in this step won't be included in the final output image.WORKDIR /usr/app
: inside the docker image, set the working directory as/usr/app
. There isn't anyapp
folder in a normal docker image, so it will makeapp
directory. We will put the compiled js code there.COPY package*.json tsconfig.json ./
: we need these files for compiling typescript into javascriptt files.npm install
: it will install dependencies.npm run build
: it will compile typescript code into js.RUN ls -la # for debugging
: it is merely for debugging. While building, docker will output what's inside there at that time, for you to verify if you are doing what you intended to do.FROM amazon/aws-lambda-nodejs:14
: this is the second build stage in Docker. All outputs from the previous stage will be discarded in this stage unless explicitly specified to be included.RUN npm install --only=prod
: it will only installdependencies
butdevDepdencies
.COPY --from=builder /usr/app/lib /usr/app/lib
: it explicitly refers to the previousbuilder
stage to copy whatever that was inside/usr/app/lib
to the current/usr/app/lib
. In this case, it will copy all compiled javascript code.CMD [ "/usr/app/lib/index.handler" ]
: the command should bepath-to-lambda-handler-without-extension.handler
. That's just how it works.
Now we’ve added a Dockerfile. Now let’s setup basic environment for lambda:
You will need to modify tsconfig
to use modern javascript features; Most prominently, add the following. This will allow you to use Promise
API. I recommend turning other options too, especially those related to strict-type checking:
Modify packages/hello/package.json
too. Be noted that any dependencies you add to be included in the final, compiled output code (javascript) will need to be added to dependencies
, not devDependencies
:
Now, add a really simple lambda:
So far, we have created these:
Now, create nodemon.json
under server/
to watch and build files:
After creating nodemon.json
, you can start running npm run watch
or npm start
. It would do two things: build Dockerfile as you make any changes under packages/
directory, and host a local endpoint for the lambda. It will be similar to hot-reload although it seems more like a hack; you won't need to cancel and run sam local start-api
again once you make a change. If it does not work, try again after creating ECR first.
Oh, and you can delete __tests__
and lib/hello.js
because we are not using them. Anyways, now we are kind of ready to build this function into a docker image. Let's try it:
Everything’s cool, docker build succeeded. You can try running the image and test the request:
This is where SAM CLI should start to come in. But before then, we will need to make a ECR repository with terraform. Let’s go back to terraform for a while.
Back to Terraform: assuming role and ECR
Now, we will need to create a role first because we will relay on that role to get required permissions to create whatever resource we want to. This is called ‘assuming a role’, and the reason why it’s deemed to be a good practice is that you won’t have to create multiple credentials (probably multiple users) to do certain thing that requires permissions. Instead, you borrow the permission for the period of time when you plan and apply the changes in the resources.
So how do we do it? First, let’s create hello_role.tf
:
For this article, we won’t be diving deep into specific policies, so we will just allow almost all resources without specifying them in detail. For real-world usage, you will have to define exact statements giving just the right permissions.
What we are doing here, essentially, is that we are allowing localtf
user to assume the role of hello_role
that possesses all policies to run the hello server stack. This is called 'creating a trust relationship' (you will see this if you do this process on AWS). This way, localtf
won't always have to hold all permissions it needs. It only acquires them only when needed (i.e. deploying)
Once you are done writing hello_role.tf
, run terraform apply
to make changes.
Now, go back to main.tf
and add:
Once you add assume_role
, now you can create any resources you want, using the permissions given by that role. Let's now make an ECR repository. Make ecr.tf
:
Run apply
and terraform will soon make ECR.
Building and pushing the image to ECR
Now that we’ve made an ECR, we can go back to our server and write a little script to login, build and push the image of the lambda we are writing.
It’s pretty much straightforward; just get your AWS cli ready for using; and authenticate to be able to use ECR from Docker CLI:
Next, tag your Docker image as latest
and separate timestamp at the time of the build, so that latest tag will always be the latest built image, and then another image will remain for the record, which you can use later to revert back or do other things in some cases.
Now, you can test it out yourself as such:
After this, you will be able to see on AWS ECR:
as you can see, the images are going to be tagged by timestamp, and the latest built image will be always tagged as latest
, and you are going to reference this tag in Terraform to apply newly built Docker image to lambda.
So far, we’ve made changes like so:
Now the majority of the prepartion is done, so we can move onto creating actual lambda and API gateway.
First, the most important part: you want to create lambda itself from Docker.
There is already a module just made for this, so use it to make lambda. Once you are done, apply
the changes.
The key here is that you are going to reference an image URI from ECR where the tag is the latest. If you have previously built and pushed a new docker image, the hash of the image is going to be different, thus causing a redeployment of the lambda function. Otherwise it has no idea if the docker image is new or not.
Again, after running the change, you can see the lambda on AWS console:
as you can see, compared to other lambdas, it has the package type of ‘Image’, which means it’s not from a Zip, but a Docker image.
You should be able to see the image URI (including the hash) of the image at the bottom of the information about lambda. If you click on that image URI, you will be navigated to the latest image on ECR that you just built and pushed.
Now, you will be able to test lambda on AWS Lambda console, but what we want in the end is something like sending GET /hello
to a certain domain and receiving a response. To be able to do that, we need to setup API Gateway.
For this example, we will setup a domain at api.hello.com
.
Here’s how:
I will explain the code one by one.
aws_api_gateway_rest_api
will create a REST api. It usually does not include a single endpoint; it usually contains multiple, like: api.hello.co/hello
, api.hello.co/bonjour
, api.hello.co/nihao
, and so on. On AWS, it is equivalent to a single row in APIs tab:
aws_api_gateway_resource
: to put it very simply, you can think of this as a single API endpoint that has not yet been deployed. In this case we create a single endpoint ending with /hello
.
In the below example, we have created two different resources with OPTIONS
and POST
, to the same path (hidden by black overlay). We will talk about creating OPTIONS resource to handle preflight requests later. For now, it would suffice to know that creating a REST resource means creating a certain endpoint.
aws_api_gateway_method
: Now, you want to create a REST method for that resource. Our /hello
endpoint will just require no auth (out of scope of this article), and be a GET method.
aws_lambda_permission
: By default, there's no permission for API gateway to invoke lambda function. So we are just granting a permission to it so that it can be executed.
aws_api_gateway_integration
: API gateway supports transforming (filtering, preprocessing, ...) a request before it reaches client or a response from the client before it reaches the actual lambda. We are not doing any special things here for this example, but you may want to use it in the future. For more, read relevant AWS docs.
aws_api_gateway_stage
: API gateway supports separating API into different stages out of the box. You should use this to separate your API across production, staging and development environments. For now, we will only make a stage for current terraform workspace, which is assumed to be dev
across all examples in this article. Once you apply your changes, you are going to be able to see this on your AWS console:
aws_api_gateway_deployment
: This is equivalent to clicking on 'Deploy API' on AWS console.
Once resources are created in API gateway, they must be deployed in order to be reachable from external clients. One little problematic thing is redeployment
; Even if you make a change to your REST API resources, it will not get deployed if redeployment
argument does not change. There are mainly two ways of getting around this:
- Use
timestamp()
to trigger redeployment for every singleapply
. Using this approach, lambda may be down for few seconds while redeployment. But it is for sure that it always deploys, so I would just go with this one if my service does not handle many users. - Use
md5(file("api_gateway.tf"))
to trigger redeployment whenever this file changes. But you need to always make sure that EVERYTHING related to API Gateway deployment only stays inside this file.
Ok. So far we have set up lambda and basic API Gateway configurations. Right now, you can test your API on Postman like this: first, go to AWS API Gateway console, and find a specific endpoint that is deployed to a certain stage. There should be an ‘Invoke URL’ at the top of the page.
Now, open up Postman, and
- Insert your invoke URL
- Click ‘Authorization’, and choose the type ‘AWS Signature’, and enter
AccessKey
andSecretKey
. These keys should be coming from one of user's credentials from AWS IAM Console. If you do not have one specialized for calling an API set up with lambda and API gateway from local environment, make one user for that and get the keys. - Insert your AWS Region.
- If your API requests any more query parameters or body, insert them.
- Click send, then it should work.
If you do not provide AWS Credentials in your request header, it won’t work, because so far your API can be only used by IAM users known to the API gateway, and if you are just sending a request from your local computer without providing any access and secret keys, it won’t know it’s you. To make it work even without providing credentials for any public APIs intended to be exposed to client applications, you should now configure custom domain.
So far we have made changes to make lambda and API Gateway resources. The list of files we should have by now is the following.
Next, we will see how to create a custom domain and relate that domain to the REST API we just made.
Creating Terraform resources for custom domain
Now, the problem is that we have the API, but it’s not callable from any external client applications, which is a common case for many projects. So we want to register a domain first to represent our endpoints.
Before making a change for custom domain, we need to setup another AWS provider because we will need to use us-east-1
region for Edge-optimized custom domain name ( that's the only region that supports creating an ACM certificate for Edge-optimized custom domain name).
There are two choices available for an API endpoint: 1. edge; 2. regional. If your endpoint should be accessed by worldwide clients, use edge; if your endpoint is specifically confined to be used in one specific region in the world, use regional. If you don’t know what to do, it totally safe to go with edge for now. First, add another aws provider:
Then, you write the actual code to make a certificate and custom domain.
Make sure you get the existing hosted zone ID (or any other zone ID that you intend to use) from AWS Route53 Console:
Use that ID to create Route53 Record for the custom domain name.
After you apply your change, you will be able to see ACM certificate being created on AWS Certificate Manager:
Just make sure you verified the status to be ‘Issued’ and the validation status to be ‘success’. You may need to wait for several minutes before this completes. Also, if your certificate is not showing up, make sure that you are on us-east-1
, not anywhere else.
After you are done with this, now you can go back to API Gateway again, and configure custom domain. Now that you’ve registered a domain, you can see it right up from API Gateway console:
In the certificate dropdown, you should be able to see the domain that you have just created. Do not make create domain name there on the console. Now come back to terraform and let’s write the equivalent code for that.
You are just going to fill out the options that you just saw from the API Gateway console. Just fill out the relevant info about certificate, domain name, and endpoint config.
Now, this is important: you need to create another Route53 Record to map your custom domain to cloudfront. Once you create your custom domain, AWS creates ‘API Gateway domain name’, circled with red in the below picture:
You need to route the traffic to api.hello.com
to this API Gateway domain name (an example of an API Gateway domain name would be asdfasdfasdf.cloudfront.net
as long as you are using EDGE
). That's what we are doing with aws_route53_record.custom_domain_to_cloudfront
. Otherwise, the response to your API will keep showing some weird errors that are really hard to guess the causes of. I found AWS really lacking a documentation on this part, so please be advised on this one. You need to create another Route53 Record.
You will be able to verify by entering Route53 console and looking for api.hello.com
. It should appear as the following:
After that, you don’t have to do many things; just add base path mapping resource, in api_gateway.tf
. Even if you do not have an additional trailing path to your endpoint, you must create a base path mapping. Otherwise your API won't be exposed to the public.
After applying this change, verify that your API mapping has been created:
Now, you can go back to Postman, and test your api by requesting GET api.hello.com/hello
. What may be confusing here is that you are not adding any path
in base path mapping. If you add hello
as a path, your API endpoint will be configured as api.hello.com/hello/hello
, which is obviously not what we want. So do not add any path mapping if you already have configured your path in aws_api_gateway_resource
). Anyways, request and response to the API endpoint should work as expected if everything has been setup correctly so far.
Enabling OPTIONS (preflight request)
Now, our client application, of course, is not Postman, so usually clients will request OPTIONS api.hello.com/hello
first, and then request GET api.hello.com/hello
, if they intend to send CORS requests, which is a very common case (read more about that from MDN docs)
If you have not done anything related to handling OPTIONS request, it’s very likely that you will get some error in your client application, like this:
So let’s do it! There’s already a handy module written by a great dev, so we will just use that:
Note that if you have any custom headers, you must define it in your config. Next, verify that OPTIONS requests are now allowed on the console:
Also, you will need to change your lambda’s response header too. Just adding Access-Control-Allow-Origin": "*"
will work:
Now, apply the changes, and go back to your client app and retry the request. It should be working.
Summing up
If you have followed everything, you will have created these files:
So far, we have looked at how to setup, develop and deploy a dockerized lambda application with Typescript, Terraform and SAM CLI. There are tonnes of things to cover on lambda.. maybe next time, it will be on using resources inside VPC from lambda. I hope you enjoyed this and found some valuable insights. Thank you.