Serving PyTorch Models on AWS Lambda with Caffe2 & ONNX

10 min readOct 8, 2017

Code available here: https://github.com/michaelulin/pytorch-caffe2-aws-lambda

Having worked with PyTorch, I love the flexibility and ease of development of the framework versus other platforms. As PyTorch is still early in its development, I was unable to find good resources on serving trained PyTorch models, so I’ve written up a method here that utilizes ONNX, Caffe2 and AWS Lambda to serve predictions from a trained PyTorch model. I hope that you find it to be useful.

Problem

How to effectively deploy a trained PyTorch model

Solution

Using ONNX, Facebook and Microsoft’s recently released platform for Neural Network interoperability, we can convert a model trained in PyTorch to Caffe2 and then serve predictions with that model from AWS Lambda.

ONNX enables models trained in PyTorch to be used in Caffe2 (and vice versa). Eventually the framework will support Microsoft’s CNTK framework, but as of October 2017, the support hasn’t been released yet.

AWS Lambda is AWS’s serverless platform. After converting our PyTorch model to Caffe2, we can serve predictions from AWS Lambda, which makes it easy to scale and serve predictions via an API. AWS Lambda has a number of limitations that we have to work with (including limiting all files and code to a 50mb zip file).

Converting a Trained PyTorch Model to Caffe2 using ONNX

The first step is to train and save a PyTorch model that you want to serve predictions from. For this example, we can just use one of the pretrained models that’s included with torchvision. For a good tutorial on training a PyTorch model, see the PyTorch site here: http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

import torch
from torchvision import models

model = models.resnet18(pretrained=True)
torch.save(model,"model.p")

After you’ve trained your model, save it so that we can convert it to an ONNX format for use with Caffe2.

Next, we’ll need to set up an environment to convert PyTorch models into the ONNX format. We’ll need to install PyTorch, Caffe2, ONNX and ONNX-Caffe2. I strongly recommend just using one of the docker images from ONNX. It has everything you need already set up and makes it very simple to execute the script below. The ONNX docker image is available here: https://github.com/onnx/onnx

I won’t go into setting up everything on your own here, but if you’re feeling up to it, there are a couple things that I would note:

Caffe2 currently only works with Python 2.7, so make sure you have a 2.7 environment set up. You can follow the installation instructions for Ubuntu here: https://caffe2.ai/docs/getting-started.html?platform=ubuntu&configuration=compile
As of October 2017, you need to install ONNX-Caffe2 from a particular commit. See issue here: https://github.com/pytorch/pytorch/issues/2913
PyTorch needs to be compiled from source. As of October 2017, the versions of PyTorch on pip and conda do not have the ONNX modules.

Once you have everything set up, either through docker (highly recommended) or through your own set up, save your pre-trained model from the previous step somewhere you can access it and run the following script.

This loads the model into PyTorch, converts the model to an ONNX format, tests loading the model via ONNX-Caffe2 and tests whether the output from the converted Caffe2 model matches the PyTorch model.

For a more complete tutorial on converting models from PyTorch to Caffe2, see the PyTorch website: http://pytorch.org/tutorials/advanced/super_resolution_with_caffe2.html

from torch.autograd import Variable
import torch.onnx
import torch

import onnx
import onnx_caffe2.backend as backend
import numpy as np

# load model
model = torch.load("model.p")
model.cpu()

# Evaluation Mode
model.train(False)

# Create dummy input
dummy_input = Variable(torch.randn(1, 3, 224, 224))
output_torch = model(dummy_input)

# Export ONNX model
torch.onnx.export(model, dummy_input, "model.proto", verbose=True)

# Load ONNX model
graph = onnx.load("model.proto")

# Check Formation
onnx.checker.check_graph(graph)

# Print Graph to get blob names
onnx.helper.printable_graph(graph)

# Check model output
rep = backend.prepare(graph, device="CPU")
output_onnx = rep.run(dummy_input.cpu().data.numpy().astype(np.float32))

# Verify the numerical correctness upto 3 decimal places
np.testing.assert_almost_equal(output_torch.data.cpu().numpy(), output_onnx[0], decimal=3)

After you have the .proto file and upload it to S3, we can start serving predictions using AWS Lambda.

Setting up an AWS Lambda Function

Once you have your model converted to ONNX, we can set up our AWS Lambda function to serve predictions with it. The current limitations of AWS Lambda require that we fit all of our code and the necessary libraries to run into a 50mb or smaller zip file. This is why we’re using Caffe2 rather than PyTorch to serve our predictions (due to its much smaller footprint).

PyTorch takes up over 300mb of disk space, but we can fit Caffe2 and all of the necessary libraries into a 38mb zip file.

If you want, you can download the deps.zip file from the github repo, add your own python script and run it on AWS Lambda. You can find that here: https://github.com/michaelulin/pytorch-caffe2-aws-lambda

To create your own zip file for AWS Lambda, launch a new EC2 instance based on the AWS Lambda AMI located here: http://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html

This is the base image that your Lambda function will run on, so you can test how the Lambda function will work and what libraries/packages are available.

Once that’s up, run the following script to install all the necessary packages and libraries on the instance, install Caffe2 and add it all to a zip file.

# Install necessary packages and update librariessudo yum update -y
sudo yum -y upgrade
sudo yum -y groupinstall "Development Tools"

sudo yum install -y \
automake \
cmake \
python-devel \
python-pip \
git# Install necessary packages for Pillow. Not necessary if you don't need the Pillow library in Python for working with imagessudo yum install -y gcc zlib zlib-devel openssl openssl-devel
sudo yum install -y libjpeg-devel# Install protobuf

git clone https://github.com/google/protobuf.git
cd protobuf
./autogen.sh
./configure
make
sudo make install
sudo ldconfig# Install python virtualenv, set up a new environment and install necessary python packagespip install virtualenv
virtualenv ~/env && cd ~/env && source bin/activate
pip install numpy

pip install --use-wheel --no-index -f http://dist.plone.org/thirdparty/ -U PIL --trusted-host dist.plone.org
pip install protobuf
pip install future
pip install requests
pip install onnx
cd ~# Clone and install Caffe2 using cmakemkdir cf2

git clone --recursive https://github.com/caffe2/caffe2.git && cd caffe2

mkdir build && cd build

cmake -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX="/home/ec2-user/cf2/" -DCMAKE_PREFIX_PATH="/home/ec2-user/cf2/" -DUSE_GFLAGS=OFF  ..
make -j4
make install/fast

cd ~# Clone and install ONNX-Caffe2. As noted above, a specific commit of Caffe2-ONNX is required at this time for ONNX-Caffe2 to function properlygit clone --recursive https://github.com/onnx/onnx-caffe2
cd onnx-caffe2
git reset --hard f7509f293d781638ef14ac3d232de0c140ed8277
python setup.py install

cd ~# Add python packages to zip filefor dir in $VIRTUAL_ENV/lib64/python2.7/site-packages \
       $VIRTUAL_ENV/lib/python2.7/site-packages
do
  if [ -d $dir ] ; then
    pushd $dir; zip -9 -q -r ~/deps.zip .; popd
  fi
done# Add protobuf to zip filecd protobuf
zip -9 -q -r ~/deps.zip python

cd ~# Add Caffe2 to zip filecd cf2
zip -9 -q -r ~/deps.zip caffe2
cd ~# Add protobuf .so files to zip
mkdir local
mkdir local/lib

cp /usr/lib64/libprotobuf.so* local/lib/

zip -9 -q -r ~/deps.zip local/lib

After creating the zip file, you just need to add the python script that will be run via AWS Lambda. You can use the following test script. The script will download the trained ONNX model from S3 and save it to the /tmp space available to the Lambda function. 500mb of tmp space is available, so we should be able to load most trained models.

import boto3
import os

# Check if models are available
# Download model from S3 if model is not already presentif os.path.isfile('/tmp/model.proto') != True:
    s3 = boto3.client('s3')
    s3.download_file('mybucket', 'model.proto', '/tmp/model.proto')


# Load .so files before launching Caffe2

import ctypes
import os

for d, dirs, files in os.walk(os.path.join(os.getcwd(), 'local', 'lib')):
    for f in files:
        if f.endswith('.a'):
            continue
        ctypes.cdll.LoadLibrary(os.path.join(d, f))

import numpy as np
import json
import onnx
import onnx_caffe2.backend as backend

# Load ONNX model
graph = onnx.load("/tmp/model.proto")

# Load model into Caffe2
model = backend.prepare(graph, device="CPU")


def handler(event, context):
    # Create dummy input for model
    x = np.random.randn(1, 3, 224, 224)

    # Get model output
    output = model.run(x.astype(np.float32))

    # return results formatted for AWS API Gateway
    return {"statusCode": 200, \
            "headers": {"Content-Type": "application/json"}, \
             "body": json.dumps(str(output))}

This script loads the pre-trained ONNX model, loads it into Caffe2 and runs a test prediction. The handler function is the one that will be run by AWS Lambda. Inputs will be available from the event variable. In this example, we generate a random numpy array with the same size as the input, run it through the model and return the prediction as a string. In a real deployment, you could fetch the input from the event variable.

Save this script and add it to the deps.zip file (or whatever your zip file is called). In this example, I’ve added the above script (test.py) to the deps.zip file via the following command:

zip -9 -q -r ~/deps.zip test.py

Once you’ve added your script to the zip file, save the zip file to S3 or your local computer so that you can upload it to AWS Lambda.

Deploying the AWS Lambda Function

Here’s how to deploy the Lambda function via the AWS management console

Navigate to the Lambda from the management console then click on create function.

When selecting the blueprint, select “Author from Scratch”.

On the next step, name your function and then select a role. For this example, you’ll need to select or create a role that has the ability to read from the S3 bucket where your ONNX model is saved as well as the ability to create logs and log events (for writing the AWS Lambda logs to Cloudwatch). The following IAM policy should work, but you may need to modify it for your purposes.

{ “Version”: “2012–10–17”,
“Statement”: [ {
“Effect”: “Allow”,
“Action”: [
“logs:CreateLogGroup”,
“logs:CreateLogStream”,
“logs:PutLogEvents”],
“Resource”: “arn:aws:logs:*:*:*” },
{
“Effect”: “Allow”,
“Action”: [
“s3:GetObject”
],
“Resource”: “arn:aws:s3:::*”
} ]}

After setting up the role, upload the function code either by uploading the zip file from S3 or from your computer. Select the Python 2.7 runtime, and change the name of the Handler to your function.

Finally, configure a test event and save the function. For this example, it doesn’t matter what the test event is as the function does not use information from the event.

That’s it. The test event should run and you should see an output like the one below.

Enabling Triggering via AWS API Gateway

Finally, you’ll likely want to be able to trigger your Lambda function via API. AWS API Gateway makes it easy to enable this functionality. To set it up, click on triggers and then Add Trigger. Click on the empty box and select API Gateway from the dropdown menu.

Once selected, give your API a name. In this example, it’s Test. Enter a value for the deployment stage and configure security. You can leave the API open so that anyone with the url can use it. I recommend that you secure it with an API key or IAM access. Here I’ll set up the API with a key. So, select Open with access key for the security option.

Once that’s setup, navigate to the API Gateway console. On the left side of the page, select the Stages tab and then select your deployment stage (in this case “prod”). This will display your Invoke URL for calling your API.

The default setup is to call your API from any method (GET, POST, etc.). In this example, we’ll stick with the default.

After you’ve noted this, select Usage Plans on the left and create a new usage plan. You can select the settings for throttling this API key or setting a quota.

Next, add your API and stage to the usage plan.

Finally, you can create a new API key and associate it with the Usage Plan and API or use a pre-existing API key for this API.

Once you’ve set that up, you’re done. You can now serve predictions via API. The code below demonstrates how to call the API via the requests library in Python. Since the API Gateway is setup to accept any method, you can call the API with any method. Here I’ll use the POST method. The url is the invoke url followed by the stage name, “prod”, followed by the resource name “test”.

You can pass the API key in the headers by providing a value to the ‘x-api-key’ key and you can pass values to the event variable in the lambda function via the json= variable. The event variable in the lambda function is a dictionary with the values passed here available to the function via the ‘body’ key.

import requestsr = requests.post('INVOKE_URL/STAGE/RESOURCE_NAME', 
                  headers={'x-api-key': 'API KEY'},
                  json={'test':'test'})print(r.text)

That’s it. I hope you’ve found this to be helpful. Please let me know if you have any feedback. Good luck with your Lambda functions!