Deploying Pytorch & Tensorflow models as Serverless functions on AWS lambda.

Madhu Sanjeevi ( Mady )
dataDL.ai
Published in
15 min readApr 1, 2020

Recently I started an AI startup called bitsoup.ai and I have had couple of fun products to build through the startup,

so far I have been kinda a guy who does training/fine tuning/improving deep learning models and be done with it but the first time after starting a startup , I had the need to deploy the models I built to the whole world.

Initially I thought , well, its okay I prefer AWS with 12 month free tier (yeah I quit my job and am in debt ) but immediately realised after building small services in flask and django (hoping to deploy safe and sound) that it was super tough to maintain the servers and writing the cloud applications by myself.

so after, I discovered that we could serve the whole world with our models through server less applications and I built 2 of my startup products using AWS lambda functions within 3 days (40 hours). That was amazing and fun.

so in this story I will walk you through the deployment procedure for Pytorch/Tensorflow/Keras models

I wanna share my experiences and learning that I am sure will help you fasten the process.

I used the AWS lambda for my need Although there are many other options (one might fit based on your requirements )

let’s get started.

Word cloud for this story.

let’s talk about these individual pieces one by one.

Serverless framework

→ It is an open source and easy framework for building serverless applications.

→ Serverless applications are the applications that are run without the servers that are required to run our services (well, user’s don’t need to think/build servers , the cloud providers here AWS take care of them so to sound cool, we say without servers)

Ex: If you wanna run a simple python script and serve it as a POST service/run when an event is triggered, you don’t need to create a server where the code runs, you simply write/upload this code to the cloud (AWS lambda) that runs this code and does the auto scaling depends on the cloud provider.

→ No server management, Auto scaling and Pay for code execution, Bam.

→With serverless framework, we can develop, test and deploy serverless apps with just one .yml file in our code folder.

→It also provides a command line interface (CLI) that makes it easy to develop, deploy and test serverless apps across different cloud providers.

Ex: A simple terminal command “serverless deploy” does all the work with the cloud provider , within minutes our apps are ready and functional on the cloud.

Installation guide and Setting up AWS super easy and quick.

You need to install Node js and it’s modules and set up the AWS keys to use the serverless framework. (which is out of scope of this story).

AWS Lambda and API Gateway

→ AWS lambda is a fully managed serverless computing service by AWS and our code runs in lambda functions.

→ Each Lambda function runs in its own container statelessly and AWS spins up many containers of the same function automatically depending upon the load on that function.

→Sometimes instead of creating a new container, it reuses the same container when a new request/event comes to improve the performance.(kinda cache memory)

ex: say we stored a file in the lambda function in the /tmp directory from S3 , when a new request comes, the file might be available but it’s always the case.

→ Each function has it’s code and dependencies/libraries to run that code. (example if we use Numpy in our code, there should be the folder of numpy library in the code folder, if not lambda throws the package not found error ).

→ We can create a lambda layer for each library (or multiple, depends on the size ) and use them across all the lambda functions. (ex: create a numpy library as a lambda layer and use that layer for all the functions, instead of manually placing the numpy folder in all the functions)

Pro’s

  1. Fully managed infrastructure,
  2. Pay for code execution,
  3. Automatic scaling
  4. Seamless integration with other AWS services (S3, API gateway etc..)

while it has many pro’s , when it comes to production there are few cons to deal with

some of them can be ignored for deploying deep learning models but some of them should be taken care seriously (I only talk about them)

Con’s

  1. Code package size →The zipped Lambda code package should not exceed 50MB in size (the libs and code to lambda as a zip file) if the zip file size is more than 50MB then we must use S3 to uplod the code and that unzipped S3 code shouldn’t be larger than 250MB (including layers).

This is a big limitation for us as you know pytorch/tensorflow packages are big in size

2. /tmp directory storage →Each function/instance has a local storage of 512 MB ,

This is a problem when we have big model files or some other files loading from S3.

3. Cold start time → when your function runs first time in a while (3–15 minutes inactivity, depends on various factors), the aws lambda takes some time to spin up a container to run the code so this latency also depends on various factors like deployment package size.

The latency can be between 3–15 seconds ( to my exp) so this is a problem for first users to get the predictions.

Here are the full AWS lambda limits

→Amazon API Gateway is a also fully managed service that is used create, publish, maintain, monitor, and secure APIs at any scale.

→Build RESTful APIs so that we can integrate this with AWS lambda as a event trigger so when a GET/POST call is made, the API gateway triggers the lambda

You can find more about it in the aws docs

The only thing I wanna mention here is API gateway has a 29sec time limit so if we integrate with AWS lambda , the lambda should throw the response within this limit which is quite difficult especially at cold starts.

This is also a big problem when the model is pretty big and it takes more AWS lambda CPU time.

There are no ways to get around all of these problems so we need to carefully build/deploy our models on AWS lambda.

Pytorch Deployment

First of all, let me just say that there are couple of ways to deploy on AWS lambda , I might not compare and talk about all the ways but I will talk about the ways which I find simple, fast and effective.

Me asking support on twitter recently.

so having said that, for this pytorch deployment I don’t use the serverless framework , instead I directly use the AWS lambda console to write the code and deploy it very fastly like literally under 5–10 mins if you got the code ready.

I took an image segmentation problem where the input is an RGB image, the model returns the segmented output of that input

I took the trained model (UNet_MobileNetV2) and a piece of prediction code from this repository and I modified for the images for demonstration purpose to give the result of single image like below.

Input (left), Output (right)

First, let’s setup the environment, as we discussed, aws lambda needs a zip of all the libs and code.

In our case we need the following libraries to run the code

  1. Torch >1.0.1
  2. Torchvision >0.1.0
  3. Numpy
  4. cv2

These libs are enough to run many deep learning vision models on production

Since we talked about Lambda layers, AWS provides lambda layers of these libs except cv2

AWS currently offers these two layers for pytorch

You need to replace AWS_REGION with the region you need like us-west-2

Here we take the second one which has more stuff than the first.

for cv2 we create a lambda layer by taking the cv2 package from python3.6/lib/site_packages and placing them in a folder called (python) and zip it and uploading it to AWS lambda layers.

We sorted the libraries out, so let’s go ahead and create a lambda function and add these two layers

Step1: Create a lambda function

Step 2: Add the pytorch layer to the fucntion (arn:aws:lambda:AWS_REGION:934676248949:layer:pytorchv1-py36:2)

Step3 : Add our custom cv2 layer

Save the changes , then there you go

To test the versions of these ,add the following code in the lambda_handler

The AWS lambda function needs to know which peace of code should be executed first or the entry-point so we define the entrypoint of the function by creating a python function with two parameters as inputs (event,context) and mention the python-filename.methodname in the handler section in AWS lambda function editor.

After that, create a test event from AWS portal and click the test button (before you do that make sure to increase the timeout down below the fucntion code to 30+sec default is 3 secs and you might want to increase the RAM)

Cool, now we got our dependecies ready

Now let me explain how to save/load model and make predictions.

  1. We first create a S3 bucket where we store the models and other files which might be needed for the model to run , then we load them when we run the lambda function.

these are the python functions to load the model and input image from S3 to /tmp directory if they are not present in the /tmp folder (as we discussed above sometimes, aws lambda maintains the cache)

When a lambda function is created,It also creates an IAM role with cloudwatch logs permissions.

inorder to access S3 from lambda functions, we need to add the read/write permissions to our IAM role.

if everything is right in place, then we can proceed writing the code in the function.

Note: I don’t touch the code part but I provide all the code as zip file so here is my code structure.

And that’s it, when you test the model by click the Test button or when an event is triggered by API gateway, the following things happen

  1. First the lambda function unzips the requirements from the layers then it looks for the entry point (here lambda_handler function) and starts executing the code line by line.
  2. It loads the model and input from S3 to the local /tmp directory
  3. It runs the prediction code we wrote and returns the output image
  4. Finally it saves the output image locally then upload it to S3 (same bucket I used here)

Of course you can add some other stuff like creating a POST API call from API gateway and connect to this function so we can pass the input image and get the output image over a POST call.

Although, it seems like many steps to be followed , trust me if you have done once, the next time it takes 10–15 mins to setup everything.

I found this way easy and fast for simple services (models can be complicated) and of course there are couple of other ways to do the same thing ,

we can either use serverless framework or SAM framework by AWS to achieve the same thing.

TF-LITE Deployment

As we know that tensorflow models are written into low level code (.pb/.tflite files) so if we have a model in a .tflite file we can run that model and make predictions at inference.

for that we don’t need the tensorflow package at all , The Tensoflow team provides another package called tflite package seperately.

this package is very lightweighted and used to execute the tflite models without the tensorflow package (which is huge in size)

so deploying tflite files is super easy, let’s see how

Here we can use serverless framework to deploy it but we don’t fully utilize the power of the serversless framework (as the packages required to run our code are small in size so we can zip those packages along with our code)

Steps

  1. We first put model files and some additional files in a S3 bucket
  2. Write the function code to execute the model (from reading the tflite model from S3 to making predictions)
  3. place the python packages that are required to run the code (tflite-runtime,cv2,numpy and boto3) in the same folder (all of them should be on the same folder).
  4. Create a serverless configuration file (serverless.yml) where we write instructions to let AWS setup the deployment.
  5. And run simply this command in the terminal “serverless deploy

For this deployment, I took an image classification problem and took the Mobilenet_V1_1.0_224_quant.tflite model from the tensorflow docs along with the labels.txt which was trained to classify 1000 objects.

I placed them in a S3 bucket

I wrote couple of helper functions and main function which reads the model and the test image from S3 and make the class prediction and returns it.

I placed the packages (tflite-runtime, cv2, numpy ) that are needed to run the code in the same folder of the code.

I exacted these packages by following steps

  1. Create an Amazon ec2 instance and pip install these packages for certain python environment (like python3.7/6/5 or python2.7)
  2. copy the folders of those packages which are present in the amazonlinux:/usr/local/lib64/python3.7/site-packages the to the code folder of ours which is present in our local machine.

You might wonder “Why can’t I copy the same folders from my local machine ?? (ubuntu or windows)”

You can’t! because the AWS lambda uses the amazon linux so these filese should compiled in that operating system only.

There are couple of ways also like using the docker, here is the link or using the serverss framework to extract the zip of libraries (which again uses the docker inside)

Anyway I am gonna provide these packages in my github.

let’s create the serverless.yml file

After this , run the command “serverless deploy” that does all the jobs required.

After this process you will see .serverless folder in your directory which has the AWS config files and Zip of the code+packages folders.

That’s it. Now if you go test the function or call the POST service you get the output

This is the test photo and the results.

You could do the same thing by creating AWS lambda layers of these packages as well or using SAM framework or the other type of deployment which I am about to show below.

Based on the requirements , we can choose which process is good. You only will understand once you tried all the ways to do the deployement.

This is one of the ways which worked well for my requirements.

Keras Deployement

This deployment is similar to the tflite one only except here not only we load the models from S3 but also we load all the python packages which are required to run the model from S3.

As we know that there is a limit in deployment package size in AWS lambda, reading these packages directly from AWS lambda is gonna cause memory issues by using the serverless framework, we could get away with the by reducing the size of the packages and uploading the packages to S3 and reading from S3.

so All we have to do is “ Defining all the things in the serverless.yml file ”

but before that, install a serverless plugin called “serverless-python-requirements” which automatically bundles dependencies from requirements.txt and make them available in your PYTHONPATH.

Here is the link which has the instructions to do that.

Steps:

  1. We first put model files (.hdf5/.pb) and some additional files in a S3 bucket
  2. Create the requirements file say requirements.txt where we mention the packages names
  3. Write the function code and logic for running the model keep it as a python file in the same folder.
  4. Create a serverless configuration file (serverless.yml) where we write instructions to let AWS setup the deployment.
  5. And run simply this command in the terminal “serverless deploy

The deployment steps are similar to the previous one except the step 2(here we just store the zip of python packages and code in S3 and load from there).

Here I took again an image segmentation problem but only person segmentation

The keras (.hdf5) model from this github repo where the author already trained the model.

the following packages are required to run the model.

tensorflow==1.0.0
Keras==2.0.9
numpy==1.18.1
h5py==2.9.0

put them in the requirements.txt file.

the function code is same as the previous one execpt the model predictions part.

That’s the code part!

Let’s look at the serverless.yml file

First Part
Last part.

When all of these pieces are in place, then comes the last part of deploying

Once this is done, our model is safely deployed and ready to test.

One additional thing I had to do here is , in the code I used opencv but in the requirements I did not mention that library , the reason for this is I already have the cv2 AWS lambda layer (if you remember the pytorch deployment, I had added it to my AWS lambda layers) so I will use that layer for this function also.

okay! if you test and get the positive response then you will see this output.

Input(left), Output(right)

And that’s it,thats how you can deploy keras models.

Since keras uses Tensorflow in the backend, the Tensorflow package also is present so we can run tensorflow models as well using the same procedure (with/without the keras package).

The only difference is the prediction code and model file (instead of .hdf5, we deploy .pb files)

And that’s it, that’s all I needed to share.

My statement

Since it’s been 2 weeks only exploring serverless and AWS for me,my understanding is limited so there could be some things that fasten these processes and make somethings better,

If you know any, let me know also and I will also try and update the story.

I really wanna share this as I had to go throguh many blogs and videos to understand these (spent a lot of time and frustation) , I wanna share this as I believe it helps others to make things faster.

If you have any questions/thoughts/suggestions/improvements, feel free to inform me over Linkedin or Twitter.

All the code and packages are found at my Github I hope you can make use of the repo along with this story.

--

--

Madhu Sanjeevi ( Mady )
dataDL.ai

Writes about Technology (AI, Blockchain) | interested in Programming || Science || Math https://www.linkedin.com/in/madhusanjeeviai