Running TensorFlow on AWS Lambda using Serverless

For smaller workloads, serverless platforms such as AWS Lambda can be a fast and low-cost option for deploying machine learning models. As the application grows, pieces can then be moved to dedicated servers, or PaaS options such as AWS Sagemaker, if necessary.

Although it is instructive to first use Lambda by uploading code directly, it is best to select a framework so that you can leverage additional integrations, such as API Gateway with AWS. For this example we will use TensorFlow as the machine learning library, and so we will look for frameworks that can deploy Python applications.

Zappa is well known for being able to easily deploy existing Flask or Django apps, however since we are creating this with serverless in mind from the start we will select the ubiqutous and powerful Serverless framework.

When treating infrastructure configuration as a first-class citizen it is advisable to first create a shell of the application and deploy it, and then write the actual code. This allows for rapid iterations that are close to the end-state, and avoids costly surprises down the road.

Structuring the project

For machine learning most of the work can be categorized into three critical steps:

  • Retrieving, cleaning, and uploading the input data
  • Training the model and saving the results
  • Inferring (i.e. predicting) a new result based on a new set of data

At its core, designing for serverless platforms means thinking of how to segment your code by individual deployable functions. In light of the categories above, we will structure our project like so:

├── TfLambdaDemo
│ ├── upload.py
│ ├── train.py
│ └── infer.py

Be sure to also create a new virtualenv:

$ pyenv virtualenv 3.6.5 tflambdademo
$ pyenv activate tflambdademo

Adding Lambda handlers

A “handler” is the term used for the function that will actually be invoked by Lambda, and is always called with two parameters, event and context. From the docs:

event – AWS Lambda uses this parameter to pass in event data to the handler. This parameter is usually of the Python dict type. It can also be list, str, int, float, or NoneType type.
context – AWS Lambda uses this parameter to provide runtime information to your handler. This parameter is of the LambdaContexttype.

If you were invoking the functions directly, event would be of a type made in that call. However, we will plan to invoke using an HTTP POST request through API Gateway, which means the data will be contained in event['body'] and we will need to return a compatible response.

To get started, add boilerplate functions into each of the .py files mentioned above:

import json
def uploadHandler(event, context):
body = json.loads(event.get('body'))
    response = {
"statusCode": 200,
"body": json.dumps(body)
}
    return response

Installing and configuring Serverless

If you are not familiar with Serverless, the first thing to note is that it is actually based on Node.js. This may seem odd since we are building a Python app, but it makes sense once you realize that this framework is really a developer CLI tool and not something that ships with your product.

On a Mac you can install via Homebrew:

$ brew update
$ brew install node

Verify that you have node installed, as well as the package manager npm:

$ node --version
$ npm --version

Install Serverless as a global package (-g) and verify that the serverless command is now available on your CLI:

$ npm install -g serverless
$ serverless

Create a new Serverless project in the TfLambdaDemo directory:

$ serverless create --template aws-python

Notice the new file serverless.yml and how your .gitignore file was auto-populated. Serverless also created handler.py template file, but you can delete this.

When you open serverless.yml you will see a lot of boilerplate information, which is good to familiarize yourself with. First update the service name totflambdademo, and then update theprovider section to AWS running Python 3.6 in the region of your choice. Defining the stage is useful when managing production deployments, but for now we will leave it as dev.

In the functions section list out upload, train, and infer with a handler format of <filename>.<function>. The events section contains the information for how the function will be called. Since we plan to use API Gateway, we will add the http trigger, and set the timeouts to 30 seconds to match.

In AWS Lambda the allocated memory can be configured, and then CPU is scaled accordingly. Since the machine learning training operation will be computationally intensive change from the default of 1024 MB to the maximum of 3008 MB (we can optimize later).

Your serverless.yml file should now look like:

service: tflambdademo
provider:
name: aws
region: us-east-1
runtime: python3.6
stage: dev
functions:
upload:
handler: upload.uploadHandler
timeout: 30
events:
- http:
path: upload
method: post
  train:
handler: train.trainHandler
timeout: 30
memory: 3008
events:
- http:
path: train
method: post
async: true
  infer:
handler: infer.inferHandler
timeout: 30
events:
- http:
path: infer
method: post

Since we already added boilerplate functionality into the handlers, we can deploy with the following command:

Note: To deploy you will need an AWS account and your credentials properly configured. For details see the docs.

$ serverless deploy -v

At the end you should see the following information, where <id> will be custom to your deployment:

Service Information
service: tflambdademo
stage: dev
region: us-east-1
stack: tflambdademo-dev
resources: 22
api keys:
None
endpoints:
POST - https://<id>.execute-api.us-east-1.amazonaws.com/dev/upload
POST - https://<id>.execute-api.us-east-1.amazonaws.com/dev/train
POST - https://<id>.execute-api.us-east-1.amazonaws.com/dev/infer
functions:
upload: tflambdademo-dev-upload
train: tflambdademo-dev-train
infer: tflambdademo-dev-infer
layers:
None

What just happened? Serverless took care of all the heavy lifting by first creating a new S3 bucket and uploading your code, and then creating a CloudFormation template that executed the following:

  • Create the Lambda functions
  • Create API gateway with the defined endpoints configured to integrated with the handlers
  • Create a new IAM role and the proper permissions
  • Create a new log group viewable in CloudWatch

Test out the new endpoint by verifying that a request body is sent back (remember to replace <id>):

$ curl -X POST https://<id>.execute-api.us-east-1.amazonaws.com/dev/infer -d '{"foo": "bar"}'
{"foo": "bar"}

Magic!

Adding in TensorFlow

Before doing anything else, let’s see if we can successfully add TensorFlow to our project. To each of the .py files add import TensorFlow as tf. Then install via pip and re-deploy.

$ pip install tensorflow
$ pip freeze > requirements.txt
$ serverless deploy -v

Everything looks fine, but when we try to test the endpoint we get an error:

$ curl -X POST https://<id>.execute-api.us-east-1.amazonaws.com/dev/infer -d '{"foo": "bar"}'
{"message": "Internal server error"}

If we go to CloudWatch we can see the following error:

Unable to import module 'infer': No module named 'tensorflow'

This seems surprising since invoking the function locally is successful:

$ serverless invoke local --function infer
{
"statusCode": 200,
"body": "null"
}

The reason is that even though we installed TensorFlow to our virtual environment, there was no mechanism to add these libraries to our Lambda package. The only content in that package was our raw code, which up until now depended only on the pre-bundled library json.

Adding the serverless-python-requirements plugin

One great thing about Serverless is the extensibility via plugins. In order to bundle dependencies from our requirements.txt file we will use serverless-python-requirements.

Installing this plugin will add a package.json file, thenode_modules directory (be sure to add this to your gitignore!), and a plugins section to your serverless.yml file.

$ serverless plugin install -n serverless-python-requirements

To give us the most flexibility we will use the Dockerize option. This avoids some complexity with using the binaries from our virtual environment, and also allows for compiling non-pure-Python libraries. To select this option, odd the following section to your serverless.yml file:

Note: You will need Docker installed on your machine to continue.

custom:
pythonRequirements:
dockerizePip: true

If we now run serverless deploy -v we can see additional upfront actions to create the Docker image. However, it still fails!

An error occurred: UploadLambdaFunction - Unzipped size must be smaller than 262144000 bytes (Service: AWSLambdaInternal; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: c3b94dc7-6a06-11e9-8823-bb373647997a).

Our zipped payload for Lambda balloons from 6.5KB to 126.9MB, but more importantly the unzipped size is 509MB which is not even close to the 262MB limit! If we download the zip file from S3 we can see that 399MBs are coming from the tensorflow folder.

How do I fit all this stuff in that box?

To get everything down to size we will employ three techniques available in the serverless-python-requirements plugin:

  • zip — Compresses the libraries in an additional .requirements.zip file and addsunzip_requirements.py in the final bundle.
  • slim — Removes unneeded files and directories such as *.so, *.pyc, dist-info, etc.
  • noDeploy — Omits certain packages from deployment. We will use the standard list that includes those already built into Lambda, as well as Tensorboard.

The custom section in your serverless.yml file should now look like:

custom:
pythonRequirements:
dockerizePip: true
zip: true
slim: true
noDeploy:
- boto3
- botocore
- docutils
- jmespath
- pip
- python-dateutil
- s3transfer
- setuptools
- six
- tensorboard

You will also need to add the following as the first four lines in the .py files. This step will unzip the requirements file on Lambda, but will be skipped when running locally since unzip_requirements.py only exists in the final bundle.

try:
import unzip_requirements
except ImportError:
pass

Running deploy will now succeed, and we can again test our endpoint to verify the function works.

$ serverless deploy -v
...
$ curl -X POST https://<id>.execute-api.us-east-1.amazonaws.com/dev/infer -d '{"foo": "bar"}'
{"foo": "bar"}

Inspecting the file on S3, we can see that our semi-unzipped packaged is now 103MB (under the 262MB limit), and the fully unzipped package with all of the libraries is 473MB (narrowly under the 500MB total local storage limit). Success!


It’s important to recognize that we haven’t actually written a real line of code related to machine learning yet. If the infrastructure configuration is critical, the above is a validation of why it is important to start with a deployable shell first. It will help inform what restrictions you may have as you start to build out the application (e.g. you will not be able to use another large library in combination with Tensorflow), or whether it is even possible.


Creating the Machine Learning functions

For this demonstration we will leverage a Linear Classifier example from TensorFlow, which uses the higher-level Estimator API:

Using census data which contains data a person’s age, education, marital status, and occupation (the features), we will try to predict whether or not the person earns more than 50,000 dollars a year (the target label). We will train a logistic regression model that, given an individual’s information, outputs a number between 0 and 1 — this can be interpreted as the probability that the individual has an annual income of over 50,000 dollars.

Specifically we will clone census_data.py from that project, which provides the functions for downloading and cleaning the data, as well as the input function.

upload.py

Since we will be using S3 to store our data, we need to add this resource into the serverless.yml file. First add an environment variable to define the bucket name and a new IAM role. Note that we can now refer to BUCKET inside this file as well as our application.

provider:
name: aws
region: us-east-1
runtime: python3.6
stage: dev
iamRoleStatements:
- Effect: Allow
Action:
- s3:*
Resource:
Fn::Join:
- ""
- - "arn:aws:s3:::"
- ${self:provider.environment.BUCKET}
- "/*"
environment:
BUCKET: tflambdademo

Next add a new resource section which will actually create the S3 bucket:

resources:
Resources:
SageBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: ${self:provider.environment.BUCKET}

Update the upload handler to download the new data to the local Lambda storage location /tmp/, and then upload to S3. We will use an epoch prefix to separate each data-model pair.

Since no data needs to be read in from the request body we will delete that line, and also modify the response to return the epoch.

try:
import unzip_requirements
except ImportError:
pass
import os
import json
import time
import boto3
import tensorflow as tf
import census_data
FILE_DIR = '/tmp/'
BUCKET = os.environ['BUCKET']
def uploadHandler(event, context):
# Download data to local tmp directory
census_data.download(FILE_DIR)
    # Upload files to S3
epoch_now = str(int(time.time()))
    boto3.Session(
).resource('s3'
).Bucket(BUCKET
).Object(os.path.join(epoch_now,census_data.TRAINING_FILE)
).upload_file(FILE_DIR+census_data.TRAINING_FILE)
    boto3.Session(
).resource('s3'
).Bucket(BUCKET
).Object(os.path.join(epoch_now,census_data.EVAL_FILE)
).upload_file(FILE_DIR+census_data.EVAL_FILE)
    response = {
"statusCode": 200,
"body": json.dumps({'epoch': epoch_now})
}
    return response

At this point we can re-deploy the functions and trigger the upload function:

$ serverless deploy -v
...
$ curl -X POST https://<id>.execute-api.us-east-1.amazonaws.com/dev/upload
{"epoch": "1556995767"}

If you navigate to the new S3 bucket you should see the CSV files adult.data and adult.test under a folder prefix defined by the epoch in the response.

train.py

The train function will download the data from S3 based on an epoch passed in the POST body.

def trainHandler(event, context):
time_start = time.time()
    body = json.loads(event.get('body'))
    # Read in epoch
epoch_files = body['epoch']
    # Download files from S3
boto3.Session(
).resource('s3'
).Bucket(BUCKET
).download_file(
os.path.join(epoch_files,census_data.TRAINING_FILE),
FILE_DIR+census_data.TRAINING_FILE)
    boto3.Session(
).resource('s3'
).Bucket(BUCKET
).download_file(
os.path.join(epoch_files,census_data.EVAL_FILE),
FILE_DIR+census_data.EVAL_FILE)

In order to setup the estimator a set of feature columns must be provided. These columns can be thought of as placeholders to tell the model how to handle raw inputs. The census_data.py module provides a function to create two sets of columns for a wide and deep model. For this simple example we will only use the wide columns.

To actually execute the training we must provide an input function to the newly configured estimator. The input function reads data (in our case from a CSV file), and converts it into a TensorFlow tensor. However, the estimator expects an input function that has no arguments, and therefore we will use a partial function to create a new callable.

def trainHandler(event, context):

...
    # Create feature columns
wide_cols, deep_cols = census_data.build_model_columns()
    # Setup estimator
classifier = tf.estimator.LinearClassifier(
feature_columns=wide_cols,
model_dir=FILE_DIR+'model_'+epoch_files+'/')
    # Create callable input function and execute train
train_inpf = functools.partial(
census_data.input_fn,
FILE_DIR+census_data.TRAINING_FILE,
num_epochs=2, shuffle=True,
batch_size=64)
    classifier.train(train_inpf)

We will then repeat this with the test data we held back in order to evaluate the model, and print the results to the logs.

def trainHandler(event, context):
    ...
    # Create callable input function and execute evaluation
test_inpf = functools.partial(
census_data.input_fn,
FILE_DIR+census_data.EVAL_FILE,
num_epochs=1, shuffle=False,
batch_size=64)
    result = classifier.evaluate(test_inpf)
print('Evaluation result: %s' % result)

In order to save the model to re-use for creating predictions, we will zip the files up and save to the same S3 folder.

def trainHandler(event, context):
    ...
    # Zip up model files and store in s3
with tarfile.open(FILE_DIR+'model.tar.gz', mode='w:gz') as arch:
arch.add(FILE_DIR+'model_'+epoch_files+'/', recursive=True)
    boto3.Session(
).resource('s3'
).Bucket(BUCKET
).Object(os.path.join(epoch_files,'model.tar.gz')
).upload_file(FILE_DIR+'model.tar.gz')

Finally, we will prepare the result data for JSON serialization in the response, and also add in a runtime calculation.

def trainHandler(event, context):
    ...
    # Convert result from float32 for json serialization
for key in result:
result[key] = result[key].item()
    runtime = round(time.time()-time_start, 1)
    response = {
"statusCode": 200,
"body": json.dumps({'epoch': epoch_files,
'runtime': runtime,
'result': result})
}
    return response

Assuming that the test run after updating upload.py was successful, you can now deploy and test the function with that epoch key.

$ serverless deploy -v
...
$ curl -X POST https://<id>.execute-api.us-east-1.amazonaws.com/dev/train -d '{"epoch": "1556995767"}'
{"epoch": "1556995767", "runtime": 11.6, "result": {"accuracy": 0.8363736867904663, "accuracy_baseline": 0.7637737393379211, "auc": 0.8843450546264648, "auc_precision_recall": 0.697192907333374, "average_loss": 0.35046106576919556, "label/mean": 0.23622627556324005, "loss": 22.37590789794922, "precision": 0.698722243309021, "prediction/mean": 0.23248426616191864, "recall": 0.5403016209602356, "global_step": 1018}}

83.6% accuracy…not too bad, and in line with the results from the TensorFlow official example! If you visit the S3 bucket you should also see a saved file model.tar.gz in the epoch folder.

infer.py

The first step to building out the inference function is to think about how the data is coming in. The raw data should look just like the CSV input file we used above, except now it will come in the POST body.

The input function in census_data.py was built to stream CSV data from disk, which is scaleable for larger volumes. In our application we would only expect to make a small number of predictions at once, which would have no problem fitting into a small memory footprint, so we can use an easy input methodology.

To infer.py add a new function that will take in a dictionary where the keys represent the same columns that were in the CSV file and map to lists of values. Each “column” will then be converted to a numpy array with its datatype specified according to census_data.py.

Being able to use numpy makes it easy to convert to tensors, and there is no cost to our package size since it is a dependency for TensorFlow already.

def _easy_input_function(data_dict, batch_size=64):
"""
data_dict = {
'<csv_col_1>': ['<first_pred_value>', '<second_pred_value>']
'<csv_col_2>': ['<first_pred_value>', '<second_pred_value>']
...
}
"""
    # Convert input data to numpy arrays
for col in data_dict:
col_ind = census_data._CSV_COLUMNS.index(col)
dtype = type(census_data._CSV_COLUMN_DEFAULTS[col_ind][0])
data_dict[col] = np.array(data_dict[col],
dtype=dtype)
    labels = data_dict.pop('income_bracket')
    ds = tf.data.Dataset.from_tensor_slices((data_dict, labels))
ds = ds.batch(batch_size)
    return ds

Back to the main handler function we will read in the prediction data and epoch identifier, and then download and extract the model file.

def inferHandler(event, context):
body = json.loads(event.get('body'))
    # Read in prediction data as dictionary
# Keys should match _CSV_COLUMNS, values should be lists
predict_input = body['input']
    # Read in epoch
epoch_files = body['epoch']
    # Download model from S3 and extract
boto3.Session(
).resource('s3'
).Bucket(BUCKET
).download_file(
os.path.join(epoch_files,'model.tar.gz'),
FILE_DIR+'model.tar.gz')
    tarfile.open(FILE_DIR+'model.tar.gz', 'r').extractall(FILE_DIR)

As in train.py we need to setup the estimator, but now warm_start_from is specified which tells TensorFlow to load the previously run model. To setup the prediction we will use thepredict() method, and pass in the previously created input function with a lambda to make it callable.

The output of this method is an iterable, which we will convert to lists and store in a new result variable. Each item in the list will represent the result corresponding to the index of items in the lists from the input data.

def inferHandler(event, context):
...
    # Create feature columns
wide_cols, deep_cols = census_data.build_model_columns()
    # Load model
classifier = tf.estimator.LinearClassifier(
feature_columns=wide_cols,
model_dir=FILE_DIR+'tmp/model_'+epoch_files+'/',
warm_start_from=FILE_DIR+'tmp/model_'+epoch_files+'/')
    # Setup prediction
predict_iter = classifier.predict(
lambda:_easy_input_function(predict_input))
    # Iterate over prediction and convert to lists
predictions = []
for prediction in predict_iter:
for key in prediction:
prediction[key] = prediction[key].tolist()
        predictions.append(prediction)
    response = {
"statusCode": 200,
"body": json.dumps(predictions,
default=lambda x: x.decode('utf-8'))
}
    return response

Building on invoking upload and train in the previous steps, we can pass in a row from the CSV file to test the function after re-deploying.

$ serverless deploy -v
...
$ curl -X POST https://<id>.execute-api.us-east-1.amazonaws.com/dev/infer -d '{"epoch": "1556995767", "input": {"age": ["34"], "workclass": ["Private"], "fnlwgt": ["357145"], "education": ["Bachelors"], "education_num": ["13"], "marital_status": ["Married-civ-spouse"], "occupation": ["Prof-specialty"], "relationship": ["Wife"], "race": ["White"], "gender": ["Female"], "capital_gain": ["0"], "capital_loss": ["0"], "hours_per_week": ["50"], "native_country": ["United-States"], "income_bracket": [">50K"]}}'
[{"logits": [1.088104009628296], "logistic": [0.7480245232582092], "probabilities": [0.25197547674179077, 0.7480245232582092], "class_ids": [1], "classes": ["1"]}]

Since this was taken from the original dataset we can see that the correct label was >50K which the model successfully predicts at 74.8% versus 25.2% for <=50K.

Feel free to now declare yourself a psychic reader!

Final thoughts

There are distinct boundaries to this type of deployment for TensorFlow, specifically around duration, and it is always good to check whether a serverless infrastructure is actually cost effective.

If serverless is the right choice, there are a few steps that you can take to help expand the duration boundaries.

Call the functions asynchronously…

Serverless allows you to add async: true into the function configuration which would give you access to the full 900 second limit on Lambda, rather than the 30 second limit through API Gateway. In this case the request will only invoke the function, and not actually wait for a response. The downside is that you will need to setup other mechanisms to determine which epoch key you should use to train or invoke the model.

…or don’t use HTTP to trigger the training function at all

For many use cases, you may really only need to provide API Gateway integration with the invoke function. One pattern that could be used for the train function is to configure Serverless to trigger Lambda in response to S3 events. For example, when a new epoch partition is created with CSV files, train.py is automatically invoked to update the model.

Warm your functions

When invoking the train function you may have noticed that the request length was much longer than the actual runtime. This is because when you invoke the function for the first time Lambda must load your code and setup the environment (i.e. cold start). There is a great Serverless plugin available that allows you to configure automatic warming in the serverless.yml file.


To view the final code, visit https://github.com/mikepm35/TfLambdaDemo