Serving TensorFlow predictions with Python and AWS Lambda

Jacopo Tagliabue

Published in

Tooso

9 min readFeb 22, 2017

A quick and pure Pythonic way to power endpoints with pre-trained TF models

Intro

“Build something idiot proof and somebody will come up with a better idiot” Anonymous DevOps

Congratulations! You’ve built your model with TensorFlow, you’ve trained it and now you are ready to use it. If you would like a quick and easy solution to setup an endpoint on AWS and start serving predictions through HTTP requests, you’ve come to the right place!

We are gonna leverage the Serverless framework and AWS lambda to deploy a TF-powered endpoint in minutes using pure Python code.

While there are already well documented, production-level ways to serve TensorFlow models at scale, sometime you may just want to play around with your model and build POCs quickly, cheaply and with a few lines of well-understood Python code.

We decided to share a working toy project since we could not find a detailed working example online and we figured we could save some time to other data scientists with the same need.

We shall assume no knowledge of AWS Lambda or Serverless but expect you to understand the basics of a TensorFlow model.

Prerequisites

To follow this tutorial you will need the following:

Our GitHub repo, available here.
An AWS account, which will be used to deploy our code to AWS Lambda. Lambdas are “serverless functions”: the underlying infrastructure is completely abstracted away, allowing you to focus only on the code. Lambdas get triggered by events: an HTTP call, an event on Kinesis, an s3 bucket that gets updated, etc. Lambdas provide virtually “infinite” scalability thanks to AWS magic, they fit perfectly into the micro-service paradigm and, last but not the least, you pay only when they are actually triggered (there is really no need to pay for a 24/7 EC2 instance for a function that gets called once in a while in your data pipeline!).
Serverless, which can be installed in a breeze following the instructions here (remember to setup your AWS credentials as well!). Independently of this article, we suggest downloading and using Serverless as it simplifies a lot of operations with AWS Lambda, turning deployment into a neat and easy scripting job. However, nothing in this tutorial strictly depends on it: you are free to use another framework with the same basic intuition or manually deploy your function through the AWS web interface.

Workflow

There are just few steps to cover to get a working endpoint from a TensorFlow model: let’s have an overview before going deeper into the details.

Train the model and save it to disk. This is just standard TensorFlow routine: you build the model, train it with data and persist it with TensorFlow built-in functions for later use. Make sure to expose a “prediction” function decoupling the code necessary to restore the model and initialize the graph from the code necessary for the actual prediction.
Setup a Lambda function to load the model and serve the predictions upon requests. Whether you are already familiar with Lambdas or not, setting up a function is extremely easy. We’ll go into the details below.
Deploy the function to AWS. When the code is done, we just need to build the dependencies into the project folder and use Serverless to deploy our function with just one command.

Project structure

If you cloned the repo, you should find the following structure under the parent directory, tensorflow_to_lambda_serverless:

tensorflow_to_lambda_serverless
   --model
   --vendored
   .gitignore
   README.md
   requirements.txt
   settings.ini
   settings.ini.template
   run_model.py
   tf_regression.py
   serverless.yml

Let’s group the files by functionality:

Boilerplate: README.md, .gitignore and requirements.txt are just what they look like. No surprises here.
Model: the tf_regression.py contains a TensorFlow model. Our toy model is a linear regression, so our prediction function accepts one number as input (the x-value) and produces the predicted y-value as output. run_model.py can be used to re-train or run the model locally for testing purposes. Finally, we uploaded some pre-trained model files in the model folder to get you started immediately: for the sake of this tutorial, we are going to pretend this model and the stored weights are the results of your handsomely rewarded data science work and your goal is to share this awesomeness with the world through an endpoint.
Config files: settings.ini.template is a template for a settings.ini that is required to run the project. The project comes with a sample settings.ini but you can change it or create a new one.
Serverless: serverless.yml contains the information needed to deploy our function to AWS.
Lambda: handler.py contains the Lambda function that AWS will run when triggered.

(1)-(3) should be pretty straightforward. The only thing we would like to point out re: (2) is that we wrapped all TensorFlow code in a class exposing a train and a predict method: as anticipated above, the prediction step does not involve any setup, as that is done when the class is initialized in “prediction mode” (the class is indeed a toy example: it is simple enough to be immediately understandable but structured enough to simulate the main pain points of deployment).

(4) and (5) are Serverless and Lambda-specific points, and we are gonna explain them in detail now.

Anatomy of a Lambda function

As we briefly described above, Lambdas are functions that get executed by AWS when a specified triggering event happens, so that we don’t have to worry about the underlying infrastructure (availability, scalability, etc.). In our case, we would like to deploy an endpoint, so the triggering event for us will be an HTTP request: every time a client makes a GET request to our endpoint, our function needs to be triggered with the associated parameters in the request.

First, we setup our serverless.yml to instruct Serverless about our trigger. The provider section is self-explanatory and allows us to specify an AWS region and a deployment stage (dev, in our case). The functions section collects all the functions in the project to be deployed. In our case, we declare one predict function, specify the handler as handler.predict (the predict method in handler.py) and list the triggering event, a GET request with route /predict. As with any GET request, we wish to use query parameters (e.g. /predict?x=89.9 where the parameter is our input value for the regression model), but there is no need to do anything in the yml file (parameters will be handled inside the function as explained below).

While all this can be achieved through the pretty intuitive AWS web interface, it’s a tedious and repetitive job. We strongly suggest adopting Serverless and we encourage you to learn more about the capabilities of this framework (as we can barely scratch the surface here).

Second, now that the deployment instructions are done, we need to write the actual Lambda function: from the configuration above, we know the function should be called predict and it should be inside handler.py file.

The signature of the method is characteristic of Lambda functions:

def predict(event, context):

The event and context arguments contain useful information about the event triggering the call and the context of the execution (for our purposes, event is the only useful variable). The ten-or-so lines of our function are pretty obvious: we first use some helper functions to retrieve the x-value specified by the client in the request and validate the input; if the input is valid, we use the model (see below for the initialization) to make the prediction and return a nice JSON to the client.

What is a bit less obvious is what is going on at the top of handler.py, so we will break it down in parts.

It’s important to remember that while the Lambda function will be called every time a triggering event happens, the code outside the scope of the handler will run just once, when the function is deployed. Please note that if more lambdas get invoked, the code will be executed again (that is one of the reasons why we suggest looking into other deployment strategies for production code!)

First, standard Python import statements:

import os
import sys
import json
import ConfigParser

Before importing non-standard libraries we need to tell the interpreter to look for Python packages in a specific folder within the project (in our case vendored):

HERE = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.join(HERE, "vendored"))

Lambdas packages are meant to be self-sufficient so it is imperative that all non-standard dependencies are shipped with the project (see the details in the next section for the actual content of the vendored folder). Now that the script knows where to find the packages, we can import our custom classes:

from tf_regression import TensorFlowRegressionModel

Finally, following Lambda best practices, we initialize outside the handler all the objects we like to re-use across calls, in particular our TensorFlow model:

tf_model = TensorFlowRegressionModel(Config, is_training=False)

By instantiating tf_model when the Lambda gets deployed, we make sure that expensive setup operations (restoring the model, re-building the graph) are done just once and the predictions can then be served seamlessly at each request.

Deploying our model

The deployment of a Lambda function through Serverless typically involves two steps:

Build the python dependencies into a project folder
Run the deployment script to upload the project to AWS Lambda

While (1) is pretty straightforward with “normal” dependencies, it is trickier with TensorFlow, since you need to package the Linux-compatible version, which may not be the one you’re running locally. You can download a ready-made zip file with the necessary packages here (if you want to go through the process yourself, read the Appendix): unzip the content of the zipped folder in the vendored directory. Your project structure should now look like this, with dependencies inside the vendored folder:

tensorflow_to_lambda_serverless
   --model
   --vendored
      --external
      --google
      --mock
      ...
   handler.py
   README.md
   ...

Finally, open terminal, cd into the project directory and type:

serverless deploy

Serverless will use the yml file in the project to deploy the package to AWS and set up the GET endpoint exposing the model. At the end, you will get a result like:

If you open your browser and hit the endpoint in the terminal with a GET request like:

https://{LambdaURL}/dev/predict?x=4.668

you will get the promised response:

Eureka: the endpoint is serving our model predictions in a convenient JSON response.

Have fun!

When you’re done experimenting with your lambda function, type

serverless remove

to remove it from AWS and clean up the resources.

Conclusions

We have described a quick and effective way to turn any TensorFlow model into an endpoint that serve predictions: hopefully, you can now adapt this example to your use cases and ship a Lambda function in minutes!

Some concluding remarks are in order:

While we presented an endpoint-based scenario, the same basic structure can be used to deploy models in all the stages of the data pipeline: for example, you may want to read events from a queue, run a classification algorithm and dump the result on a second queue. All this can be achieved in minutes with small changes to the yml file and some code to deal with the event parameter (which now will carry information about queue messages, not HTTP requests). Once you get the gist of Lambdas + Serverless + TensorFlow, possibilities are endless: while for production-level pipelines Lambdas may not be always the optimal choice, it’s incredibly easy, quick and cheap to run experiments with this setup.
The code presented here is anything but perfectly engineered. A cool improvement would be to store and load models with an s3 bucket, not locally. In fact, decoupling the model parameters from the rest of the code enables cool use cases with just a few changes: for example, you could have overnight training sessions that automatically update the model weights with fresh data, or you could host different models and decided at query time which one to serve.
While we love Lambdas, sometime dealing with dependencies gets a bit tiring: as we have seen, dependencies must always be bundled together with the code and when packages don’t work across OS (like TensorFlow) you need to prepare and ship the Lambda-compatible version. We recently started using Docker more often across our stack and the AWS integration is amazing: Docker+TensorFlow sounds like a promising topic for a future tutorial.

See you, space cowboys

For comments and feedbacks, please free to reach out directly at jacopo.tagliabue@tooso.ai.

For updates on Tooso, follow us on Twitter.

Appendix: preparing TensorFlow for Lambda upload

For the curious reader, we present here a description of the building process with copy-and-paste commands (inspired by the slides here).

First, launch an EC2 instance (launch one in the free-tier if you don’t want to incur charges!). Then, ssh into the instance and run the following commands to install python tools, setup a virtual environment and get TensorFlow installed:

sudo apt-get update
sudo apt-get install python-dev
sudo apt install python-pip
pip install virtualenvvirtualenv tf_env
source tf_env/bin/activate
pip install tensorflow

After TensorFlow is installed, cd into the virtual environment folder and zip everything (excluding *.pyc files).

touch ~/tf_env/lib/python2.7/site-packages/google/__init__.py
cd ~/tf_env/lib/python2.7/site-packages
zip -r ~/tf_env.zip . --exclude \*.pyc

Finally, from the local terminal we can just copy over the zipped file with scp:

scp -i "{your_pem}" {user}@{your_ec2_address}:~/tf_env.zip /{your_local_path}/tf_env.zip

Unzip the archive and move the folder into the project: make sure the folder name matches the name specified in the handler.py file (default vendored).

To avoid incurring charges, don’t forget to terminate the EC2 instance when done.