Roll your own AWS Lambda Container

Published in

Fender Engineering

4 min readMay 9, 2023

Roll your own AWS Lambda Container

Lambda Functions have a temporary instance storage /tmp with 512 MB to 10GB capacity. But this is ephemeral storage and can’t really be used to install and use additional packages for your code.

AWS offers several solutions for installing large packages: lambda layers, EFS, and building a customized lambda container. In this article I will not be covering EFS.

In our environment, we have a build and deployment process that generates lambda layers based on Python requirements files. This process works well for package bundles that don’t exceed the 50MB per layer plus 5 layer per lambda limitation. As soon as your package installation size exceeds ~250MB, you have to either figure out which packages can be considered bloat or begin considering EFS or custom lambda containers.

When faced with this issue, I decided to try out lambda containers. If you want a detailed walk through on how to get started, visit AWS’s documentation here. I’m going to cover why and how I implemented custom lambda containers.

Our data engineers use lambdas to help with ETL processes. Those lambdas make good use of data related packages such as numpy, pyarrow, pyathena, and pandas. With just these packages we were easily breaking the 250MB total limitation of lambda layers. There are also many other packages that have to be installed.

I did some research on lambda containers and decided to use this feature to solve the issue for a few reasons: 1. I was already familiar with Docker containers, 2. we (the DevOps Team) had already been customizing containers for our CI/CD system and had examples of build and deployment methods, 3. You get 10GB of space to play around with, and 4. I just thought it would be fun to do.

The first thing I did was configure a Dockerfile on my local machine to install all of the required Python packages in a base container layer. Below is an example of what that Dockerfile looks like (I’ve removed company related secrets):

#python-lambda-container-base
FROM public.ecr.aws/sam/build-python3.8:latest

#Install required packages
COPY requirements.txt /var/task/requirements.txt
RUN python --version
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

Very simple, right?

After getting the kinks worked out, I implemented a build and deployment process that does the following:

checks out the repository that has the Docker code in it
builds the image from the Docker file
uploads the image to ECR with SHA1 tag as its name

In order for the final container to work in AWS, I had to setup a layer that includes the necessary entry point. The Dockerfile looks something like this:

FROM ${AWS_ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/python-lambda-container-base:latest

# Copy function code
ARG FUNCTION_NAME
COPY functions/${FUNCTION_NAME}/  /var/task/

ENTRYPOINT [ "/var/lang/bin/python", "-m", "awslambdaric" ]
CMD [ "main.handler" ]

This Dockerfile lives with the python code that needs to be deployed to the lambda. Now there is something you should note here — the code is deployed with the container. Once the lambda container is deployed, you cannot update the code by uploading a zip or editing the code in the console. The entire container has to be rebuilt. Actually, only the second layer needs to be rebuilt unless you are installing new packages.

Building one container in our build tools takes less than 5 seconds, however, deploying that container can take about a minute or more depending on how large the final image is. We use terraform to manage installing the latest containers that are built. Because we have somewhat of a complicated terraform module structure, I won’t be posting all of the code that it takes to build the lambda but I will give you the logic.

We have a generic lambda terraform module that is used to configure all of our lambdas.
We pass parameters to this module via the terraform lambda module definition. One of the parameters is a boolean which indicates whether or not the lambda is container based
If the above is true then the module code checks for an existing ECR image that matches the naming convention and pulls down the latest image ARN.
The lambda is updated with the the latest image.

The above is oversimplified however you should have a good idea of the process.

Using lambda containers is great because, with the help of mock apis and local databases, you can iterate over your code much faster on your local machine. No need to deploy to a remote dev environment. You can even share your images to help aid with troubleshooting. The possibilities are whatever you can imagine. And when your code is right, you check it in and watch your CI/CD move the code to where it needs to be. Easy.

Written by David Kinder Jr.