Setting up Python 3.6 AWS Lambda deployment package with numpy, scipy, pillow and scikit-image

Use Amazon Linux docker container to build your own Python 3.6 lambda deployment package in 8 steps.

AWS Lambda with Python 3.6 is a great new tool in serverless computing scene. However, vanilla Python 3.6 environment provided by the AWS is not well suited for development of lambdas for image processing, numerical calculations and scientific analysis. This is because it does not provide an easy and straight forward way of installing vital Python packages that are usually required for these types of applications, i.e.: numpy, scipy, pillow and scikit-image.

This issue can be overcome be creating your own, custom lambda deployment package containing the aforementioned Python libraries. In this article instructions, in 8 steps, are provided which I used personally to create such a lambda package for my own use.

Prerequisite

Basic understanding of Linux and Docker are assumed as the following steps were executed in a Linux operating system (specific distribution should not matter) and a Amazon Linux docker container. Also it is assumed that you have Docker installed.

1. Setting up Amazon Linux docker container

We are going to build the lambda deployment package inside the official Amazon Linux docker container. However, we cannot use the latest release of the Amazon Linux, since AWS lambdas are executed inside its older version. Specifically, the Amazon Linux version used to execute lambda functions is (at the time of writing this) amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2. For this reason we are going to use same version inside our docker.

# create mylambdapackag folder in your home directory 
mkdir ~/mylambdapackage
# start the docker container and share the folder created
docker run -ti -v ~/mylambdapackage:/mylambdapackage amazonlinux:2017.03.1.20170812

The above docker command will download the Amazon Linux with the given version, start it in an interactive mode, and mount folder ~mylambdapackage in your host operating system to the folder /mylambdapackage inside the docker container. Once the command completes, you should be already in the lambda container.

To verify that you are using the correct version of the Amazon Linux, the cat /etc/os-release command can be used. It should produce the following output:

NAME="Amazon Linux AMI"
VERSION="2017.03"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2017.03"
PRETTY_NAME="Amazon Linux AMI 2017.03"
ANSI_COLOR="0;33"
CPE_NAME="cpe:/o:amazon:linux:2017.03:ga"
HOME_URL="http://aws.amazon.com/amazon-linux-ami/"

After that we are going to update the Amazon Linux and install required dependencies for building the four Python packages later.

yum update -y && yum install -y gcc48 gcc48-c++ python36 python36-devel atlas-devel atlas-sse3-devel blas-devel lapack-devel zlib-devel libpng-devel libjpeg-turbo-devel zip freetype-devel findutils libtiff libtiff-devel

2. Creating Python 3.6 virtual environment

The lambda deployment package will be created in the Python 3.6 virtual environment inside the docker container’s /mylambdapackagefolder.

# go into /mylambdapackage folder
cd /mylambdapackage
# create Python environment in mylambda folder and activate it
python36 -m venv --copies mylambda && source mylambda/bin/activate

The--copies parameter will “try to use copies rather than symlinks, even when symlinks are the default for the platform”. Note: to deactivate the Python 3.6 environment just use deactivate command.

The last thing to do before we move to installing the Python packages, is to upgrade pip.

pip3 install -U pip

3. Installing numpy, scipy and pillow

Assuming you are still in /mylambdapackage:

pip3 install --no-binary :all: numpy scipy pillow

Since we are using--no-binary :all: option, binary versions of these packages will not be used. Instead they all will be compiled from source.

4. Installing scikit-image

scikit-image package depends on cython. Thus it needs to be installed first. For some reason installation of both cython andscikit-image using a single pip3 command was failing. However, installing them separately worked.

pip3 install --no-binary :all: cython

And now we can install scikit-image:

pip3 install --no-binary :all: scikit-image

Note: This will also install matplotlib, which is a dependency of scikit-image. So we should have matplotlib also available in our custom lambda package.

5. Copying required shared libraries

The compiled python packages will depended on the number of system libraries (e.g. for BLAST, LAPACK). We have them in our own Amazon Linux docker container, but they will not be available on the AWS version in which our lambdas are going to be executed. Thus we are going to provide these libraries with our lambda deployment package. Based on the instructions here, this can be done as follows:

# specify where the shared libraries will be stored
libdir="$VIRTUAL_ENV/lib/python3.6/site-packages/lib/"
mkdir -p $libdir
# copy the libraries
cp -v /usr/lib64/atlas/*.so.3 $libdir
cp -v /usr/lib64/libquadmath.so.0 $libdir
cp -v /usr/lib64/libgfortran.so.3 $libdir
cp -v /usr/lib64/libpng.so.3 $libdir
cp -v /usr/lib64/libjpeg.so.62 $libdir
cp -v /usr/lib64/libtiff.so.5 $libdir

These libraries and those that were compiled with the Python packages can be large. A good way to reduce their size is by means of the strip command which “discard symbols from object files”.

find $VIRTUAL_ENV/lib/python3.6/site-packages/ -name "*.so*" | xargs strip -v

6. Compressing the site-packages into lambda deployment package

Once all our packages are installed in our Python virtual environment, we can package them as a compressed zip file named mylambda.zip.

# compress site-packages content into mylambda.zip
pushd $VIRTUAL_ENV/lib/python3.6/site-packages/
zip -r -9 /mylambdapackage/mylambda.zip *
popd

The created zip file should be located in /mylambdapackage/mylambda.zip.

7. Adding your lambda function to the zipped package

Assuming your lambda function is located in ~/mylambdapackage (subsequently it will be available in /mylambdapackage in the docker container) you can add it into mylambda.zip from witting the docker as follows:

zip -ur mylambda.zip lambda_function.py

If you don’t have any specific lambda_function.py yet, you can use the one I quickly prepared just to check whether the packages installed work on the AWS:

This assumes default name for lambda file and function name, i.e. lambda_function.lambda_handler.

8. Deploying the lambda package to S3

Due to the size of mylambda.zip (~76MB) you will not be able to upload it using lambda creation website on AWS. Instead, you need to upload it an S3 bucket, and then in the lambda creation website just provide the link to the mylambda.zip uploaded.

Conclusions

In this article detailed steps for building a custom Python 3.6 lambda deployment package with numpy, scipy, pillow and scikit-image for AWS were provided. Hopefully the article will become useful to you. If not, I hope that at least it will get you closer to building your own custom lambda packages.

If you have any questions or issues regarding the steps described, please feel free to ask in the commends. I will do my try to help.

Appendix 1: pandas

pandas is also a very popular package that many would like to run on their Python lambdas. Installing it, in the context of the steps provided, is as easy as executing the following command after the steps 3 or 4:

pip3 install --no-binary :all: pandas

Appendix 2: requests-toolbelt

I found requests-toolbelt very hand for processing binary data submitted into a lambda through an API Gateway. Its installation (with the request package itself for completeness) can be performed in the following way:

pip3 install --no-binary :all: requests requests-toolbelt