Creating New AWS Lambda Layer For Python Pandas Library
Lambda Layer was introduced last week during AWS re:Invent 2018 as one of the two major updates to the popular AWS Lambda. The leading Function-as-a-Service platform has now established a stronger foothold in the “no-sweat-scaling” “pay-per-(sub)-second” Serverless world. Lambda Layer promotes code reuse, drives further industry standardization and opens the door to thousands of shared “AWSome Layers”.
With the announcement, AWS included a new publicly-available Layer containing NumPy and SciPy, two well-known Python libraries. For any data scientists using Python, the next important library is Pandas.
In this post, I will explain the steps to create a new Lambda Layer package for Pandas, which can be applied for any other Python libraries. I will also show how layers can be integrated into an AWS Lambda function with ease via the Serverless Framework.
Before we go into details, please note that this post assumes the reader already understands AWS Lambda and knows how an AWS Lambda function works. For a quick introduction, follow the Quick Start Guide by AWS.
A. Create a new Lambda Layer package
Let’s dive into it.
1. Libraries and versions
First, create a text file named requirements.txt
with the following content:
Tip: Click on the file name to open it in Github Gist.
As you can see, the file contains pandas
and pytz
libraries. pandas
depends on numpy
as well, but since there is already a Layer for numpy
provided by AWS, we should make use of that in our functions instead of including numpy
in this custom pandas
layer. That’s essentially one of the beauties of using layers, code reuse.
For any other libraries, update this file with appropriate dependencies.
2. Packaging script
Next, create a script called get_layer_packages.sh
.
This script uses Docker to get Lambda-compatible versions of libraries in requirements.txt
. lambci/lambda:build-python3.6
is a public Docker image, created by Michael Hart.
If you don’t have Docker Desktop installed, get it from the official website.
Note the special subfolder named PKG_DIR
being used as the location for all packages instead of the main folder. The reason for this is essentially due to the way Lambda Layer handles libraries.
When a Layer zip file is loaded into an AWS Lambda container, it is unzipped to /opt
folder along with contents of other layers and of the function itself. In order for the function to locate the libraries contained in a layer, the libraries’ files have to be under python
sub-directory of /opt
folder.
To read more about this, check out AWS Lambda Layers guide.
If you are building a Lambda Layer that supports both Python 3.6 and 3.7, then you can put the 3.6-compatible versions of the libraries under python/lib/python3.6/site-packages
and 3.7-compatible versions python/lib/python3.7/site-packages
.
Take note of the --no-deps
flag as well, which tells pip
to install only the libraries in requirements.txt
and not their dependencies. Without this flag, the numpy
library would be added to our layer, which, as explained earlier, is not what we want.
3. Layer package file
The next step is to package the zip file for the layer.
Execute the following commands from the same folder, resulting in the file my-Python36-Pandas23.zip
, mimicking AWS’s aptly-named AWSLambda-Python36-SciPy1x
layer.
chmod +x get_layer_packages.sh
./get_layer_packages.sh
zip -r my-Python36-Pandas23.zip .
Take note of resulting folder structure below once the above scripts complete:
Further optimizations such as remove info folder or dealing with .pyc
files can be done to reduce the package size, but these are not the main interest of this post.
4. Creating and testing
That is it, you are ready to create a new Layer in AWS Console with the generated zip file. This is quite straightforward, so I will not cover it here. Do refer to this guide if you have not done so before: https://medium.com/devopslinks/how-to-use-aws-lambda-layers-f4fe6624aff1
If you prefer to use AWS CLI, use AWS’s own guide here.
Test the new Layer with this sample code, remember to add both the new custom layer and AWS’s provided numpy+scipy
layer:
You should see an output similar to this when running the function:
B. Use Serverless Framework to integrate layers into functions
Managing AWS Lambda functions and layers via the AWS Console or AWS CLI is fine if you have a small setup and you work mostly alone.
When you work in a team for a comprehensive set of functions and layers, a deployment tool like Serverless Framework can help tremendously, especially when combined with a CI/CD tool.
1. Preparation
Install node.js if you don’t have it yet.
Run the following scripts to prepare the environment:
# Installing the serverless cli
npm install -g serverless# Configure AWS profile (method 1 with AWS CLI)
aws configure --profile <profile_name># Configure AWS profile (method 2 with Serverless CLI)
serverless config credentials --provider <profile_name> --key <access_key_id> --secret <secret_access_key># Create a new working folder and set up subfolders
mkdir serverless-with-layer
cd serverless-with-layer
mkdir -p layers/pandas
2. Serverless configuration
Create a new file named serverless.yaml
with below content and copy the files data_analysis.py
, get_layer_packages.sh
and requirements.txt
to the appropriate folders, achieving the following structure:
serverless-with-layer
|__ data_analysis.py
|__ serverless.yaml
|__ layers
|__ pandas
|__ get_layer_packages.sh
|__ requirements.txt
I call this service sample-service
and name the layer sample-service-Python36-Pandas23x
, but you can choose any name of choice simply by updating the name
parameter of the Pandas
layer.
The path layers/pandas
is used here but it can be changed to other desired path with a corresponding change in serverless.yaml
.
For more information on how Layers are supported by Serverless Framework, check out their guide.
3. Layer package file
Package the Lambda function and layer with the following commands:
pushd layers/pandas && chmod +x get_layer_packages.sh && ./get_layer_packages.sh && popd
serverless package
Compared with the previous method, we no longer zip the package ourselves. Instead, both the layer’s and function’s package files are created using serverless package
. The result is below output.
If you want to know exactly what will be created by Serverless Framework, here is how cloudformation-template-update-stack.json
looks like: https://gist.github.com/qtangs/5bd6d84dc6e4839b1c87665fa48ccd3f
4. Deployment and testing
Finally, deploy to AWS simply by running:
serverless deploy --package .serverless
For deployment, no manual operations on AWS Console are required.
Test the new function and compare with the above output.
C. Summary
We have created a new Lambda Layer package with a few lines of code and experienced how tools like Serverless Framework further accelerate the transition.
Lambda Layer is an exciting addition alongside Custom Runtime. Already, there is a Github repository to collect the popular ones: https://github.com/mthenw/awesome-layers.
With these improvements being added to AWS Lambda, we benefit immensely from a Serverless ecosystem that continuously enhances infrastructure and lets developers focus on core innovation.
Thank you for reading. Do comment below to share your thoughts.
The complete project for this article is hosted on Github.