Edit: Recently Amazon updated their running environment for Lambda, so I really recommend you to use the Amazon Linux AMI 2018.03.0 to build your layer.
So, recently Amazon released a new feature for their Lambda services, called Layers.
Previously you had to upload your dependencies with every single lambda package, and when working with serverless machine learning and data pipelines, this can get quite out of hand.
This tutorial aims to help you create your own layer package with the dependencies you use the most across your Lambdas, so you can take advantage of this feature and reduce the size of your Lambda functions.
First of all, let’s talk about limits. Currently AWS sets a limit of 50MB of a zipped package (COMPRESSED DATA) and 250MB of UNCOMPRESSED DATA for any Lambda package. But the first restriction is thrown away if you instead upload the package to Amazon S3. And this can make a difference of a package fitting in or not into your Lambda.
This tutorial assumes that you know how to start a EC2 Amazon-Linux instance, and have an IAM user with access to Lambda and S3.
Setting the layer up
First of all after creating and connecting to your EC2 instance, you need to setup your AWS credentials.
aws configure set aws_access_key_id <MY_ACCESS_KEY>;
aws configure set aws_secret_access_key <MY_SECRET_ACCESS_KEY>;
aws configure set default.region <MY_REGION>;
After that, since for this tutorial purpose we are going to install the XGBoost package, we will need to compile it from source, otherwise, the package size would be too big to fit our Lambda, for this we need to install gcc, and the easiest way to do it, is to install the ‘Development Tools’, we also install the desired python version, which for this tutorial purpose is 3.6.
We also need to upgrade the AWS CLI, since in the current date of publishing this tutorial, AWS CLI for the EC2 Instances don’t come with any Lambda layers functionality.
sudo yum -y update;
sudo yum -y upgrade;
sudo yum -y groupinstall 'Development Tools';
sudo yum -y install python36;
pip install awscli --upgrade --user;
Now we define the S3 Bucket and Key where the layer will be stored, as well the layer name.
Then we setup our virtual environment, containing all the dependencies that we want to include on our layer, which in your case you can edit to attain to your needs:
virtualenv -p python3 ~/base_pkg/base_pkg;
sudo $VIRTUAL_ENV/bin/pip install numpy;
sudo $VIRTUAL_ENV/bin/pip install pandas;
sudo $VIRTUAL_ENV/bin/pip install pymysql;
sudo $VIRTUAL_ENV/bin/pip install sqlalchemy;
sudo $VIRTUAL_ENV/bin/pip install scipy;
sudo $VIRTUAL_ENV/bin/pip install requests;
sudo $VIRTUAL_ENV/bin/pip install sklearn;
git clone --recursive https://github.com/dmlc/xgboost;
cd xgboost;make -j4;
cd python-package;sudo $VIRTUAL_ENV/bin/python setup.py install;cd;
Now we remove some unnecessary packages, like pip, wheel and setuptools, and also remove the pyc files, which are not necessary for running the packages, and we pack it all to the file “base_pkg.zip” in the file structure specified by amazon (under a “python” folder).
rsync -a --prune-empty-dirs $VIRTUAL_ENV/lib*/python*/site-packages/ ~/base_pkg/python/
zip -r -9 -q ~/base_pkg/base_pkg.zip . -x \*.pyc ./python/pip\* ./python/setuptools\* ./python/wheel\* ./base_pkg\*;
After this we just upload the package to the S3 Bucket, and publish the layer.
aws s3 cp ~/base_pkg/base_pkg.zip $S3_PATH;
aws lambda publish-layer-version --layer-name $LAYER_NAME --content S3Bucket=$S3_BUCKET,S3Key=$S3_KEY --compatible-runtimes python3.6
And done, the layer is ready to be used by your Lambda functions. The compressed size of the package should be around 62MB and 210MB uncompressed, making it possible to upload the layer via S3.
Thanks for reading and any questions, I will be glad to help.
The following is the script on it’s full, you should make any changes and adjustments that you might need, like changing the packages or changing the python version: