Creating New AWS Lambda Layer For Python Pandas Library

Simplifying serverless data analysis with AWS Lambda

Quy Tang
5 min readDec 9, 2018

Lambda Layer was introduced last week during AWS re:Invent 2018 as one of the two major updates to the popular AWS Lambda. The leading Function-as-a-Service platform has now established a stronger foothold in the “no-sweat-scaling” “pay-per-(sub)-second” Serverless world. Lambda Layer promotes code reuse, drives further industry standardization and opens the door to thousands of shared “AWSome Layers”.

With the announcement, AWS included a new publicly-available Layer containing NumPy and SciPy, two well-known Python libraries. For any data scientists using Python, the next important library is Pandas.

In this post, I will explain the steps to create a new Lambda Layer package for Pandas, which can be applied for any other Python libraries. I will also show how layers can be integrated into an AWS Lambda function with ease via the Serverless Framework.

Before we go into details, please note that this post assumes the reader already understands AWS Lambda and knows how an AWS Lambda function works. For a quick introduction, follow the Quick Start Guide by AWS.

A. Create a new Lambda Layer package

Let’s dive into it.

1. Libraries and versions

First, create a text file named requirements.txt with the following content:

Tip: Click on the file name to open it in Github Gist.

As you can see, the file contains pandas and pytz libraries. pandas depends on numpy as well, but since there is already a Layer for numpy provided by AWS, we should make use of that in our functions instead of including numpy in this custom pandas layer. That’s essentially one of the beauties of using layers, code reuse.

For any other libraries, update this file with appropriate dependencies.

2. Packaging script

Next, create a script called get_layer_packages.sh.

This script uses Docker to get Lambda-compatible versions of libraries in requirements.txt. lambci/lambda:build-python3.6 is a public Docker image, created by Michael Hart.

If you don’t have Docker Desktop installed, get it from the official website.

Note the special subfolder named PKG_DIR being used as the location for all packages instead of the main folder. The reason for this is essentially due to the way Lambda Layer handles libraries.

When a Layer zip file is loaded into an AWS Lambda container, it is unzipped to /opt folder along with contents of other layers and of the function itself. In order for the function to locate the libraries contained in a layer, the libraries’ files have to be under python sub-directory of /opt folder.

To read more about this, check out AWS Lambda Layers guide.

If you are building a Lambda Layer that supports both Python 3.6 and 3.7, then you can put the 3.6-compatible versions of the libraries under python/lib/python3.6/site-packages and 3.7-compatible versions python/lib/python3.7/site-packages.

Take note of the --no-deps flag as well, which tells pip to install only the libraries in requirements.txt and not their dependencies. Without this flag, the numpy library would be added to our layer, which, as explained earlier, is not what we want.

3. Layer package file

The next step is to package the zip file for the layer.

Execute the following commands from the same folder, resulting in the file my-Python36-Pandas23.zip, mimicking AWS’s aptly-named AWSLambda-Python36-SciPy1x layer.

chmod +x get_layer_packages.sh
./get_layer_packages.sh
zip -r my-Python36-Pandas23.zip .

Take note of resulting folder structure below once the above scripts complete:

Further optimizations such as remove info folder or dealing with .pyc files can be done to reduce the package size, but these are not the main interest of this post.

4. Creating and testing

That is it, you are ready to create a new Layer in AWS Console with the generated zip file. This is quite straightforward, so I will not cover it here. Do refer to this guide if you have not done so before: https://medium.com/devopslinks/how-to-use-aws-lambda-layers-f4fe6624aff1

If you prefer to use AWS CLI, use AWS’s own guide here.

Test the new Layer with this sample code, remember to add both the new custom layer and AWS’s provided numpy+scipy layer:

You should see an output similar to this when running the function:

B. Use Serverless Framework to integrate layers into functions

Managing AWS Lambda functions and layers via the AWS Console or AWS CLI is fine if you have a small setup and you work mostly alone.

When you work in a team for a comprehensive set of functions and layers, a deployment tool like Serverless Framework can help tremendously, especially when combined with a CI/CD tool.

1. Preparation

Install node.js if you don’t have it yet.

Run the following scripts to prepare the environment:

# Installing the serverless cli
npm install -g serverless
# Configure AWS profile (method 1 with AWS CLI)
aws configure --profile <profile_name>
# Configure AWS profile (method 2 with Serverless CLI)
serverless config credentials --provider <profile_name> --key <access_key_id> --secret <secret_access_key>
# Create a new working folder and set up subfolders
mkdir serverless-with-layer
cd serverless-with-layer
mkdir -p layers/pandas

2. Serverless configuration

Create a new file named serverless.yaml with below content and copy the files data_analysis.py, get_layer_packages.sh and requirements.txt to the appropriate folders, achieving the following structure:

serverless-with-layer
|__ data_analysis.py
|__ serverless.yaml
|__ layers
|__ pandas
|__ get_layer_packages.sh
|__ requirements.txt

I call this service sample-service and name the layer sample-service-Python36-Pandas23x, but you can choose any name of choice simply by updating the name parameter of the Pandas layer.

The path layers/pandas is used here but it can be changed to other desired path with a corresponding change in serverless.yaml.

For more information on how Layers are supported by Serverless Framework, check out their guide.

3. Layer package file

Package the Lambda function and layer with the following commands:

pushd layers/pandas && chmod +x get_layer_packages.sh && ./get_layer_packages.sh && popd
serverless package

Compared with the previous method, we no longer zip the package ourselves. Instead, both the layer’s and function’s package files are created using serverless package. The result is below output.

If you want to know exactly what will be created by Serverless Framework, here is how cloudformation-template-update-stack.json looks like: https://gist.github.com/qtangs/5bd6d84dc6e4839b1c87665fa48ccd3f

4. Deployment and testing

Finally, deploy to AWS simply by running:

serverless deploy --package .serverless

For deployment, no manual operations on AWS Console are required.

Test the new function and compare with the above output.

C. Summary

We have created a new Lambda Layer package with a few lines of code and experienced how tools like Serverless Framework further accelerate the transition.

Lambda Layer is an exciting addition alongside Custom Runtime. Already, there is a Github repository to collect the popular ones: https://github.com/mthenw/awesome-layers.

With these improvements being added to AWS Lambda, we benefit immensely from a Serverless ecosystem that continuously enhances infrastructure and lets developers focus on core innovation.

Thank you for reading. Do comment below to share your thoughts.

The complete project for this article is hosted on Github.

--

--

Quy Tang

A drop in a river, a part in a community, a student of mindfulness and compassion, towards a kinder, wiser global community