Serverless Vision Inference using AWS Lambda and TFLite

Published in

SmellsLikeML

2 min readOct 1, 2019

In many programming languages, when you want to quickly whip up a generic function to process data, you turn to the ‘lambda’ function. In AWS, this use case has been supported through the AWS Lambda service.

Using AWS Lambdas, you can consume records from a Kinesis stream, objects from S3, items from a database. However, this flexibility also comes with constraints.

Specifically, lambda packages must satisfy a 50MB size constraint , which can be prohibitive for some applications. Because of package size, memory, and timeout constraints, running an image classifier or detector on can be difficult.

Some have shown how to hack older, smaller versions of the Tensorflow library to meet the package size limit. Nonetheless, we still face memory and timeout challenges to loading large models and will not have space in the package for many other libraries.

Running inference with Tensorflow on embedded has been enabled by libraries like TFLite, which is designed for the memory and compute constraints of mobile/IoT devices. More recently, this has been made available as a standalone library to reduce the disk space needed.

This means we don’t need to take the full Tensorflow library and strip away unnecessary parts but rather, can take only the parts necessary for running inference on small TFlite models. We also get many newer ops not supported by the older TF versions and faster inference on smaller, quantized models.

The official TFLite docs provide pip wheels to install on your device containing just the TFLite interpreter. Although they support x86_64 architecture, these wheels don’t work quite right on Amazon Linux OS, the platform on which Lambda functions run.

Thankfully, the Tensorflow team has also provided documentation on how you can build your own pip wheel for the standalone library. Slightly modifying their build_pip_package.sh script by deleting chunks referencing building Debian packages runs smoothly and creates a usable pip wheel in the /tmp/ directory.

To create the lambda package, make a python3.7 virtual environment and install your package dependencies.

(lambda_env) $ pip install /tmp/tflite_pip/python3/dist/tflite_runtime-1.14.0-cp37-cp37m-linux_x86_64.whl
(lambda_env) $ pip install <any-other-python-packages>

Then, deactivate your environment and zip all of the python dependencies.

$ cd lambda_env/lib/python3.7/site-packages/
$ zip -r9 ../../../../lambda_function.zip .

If you only installed the TFLite interpreter wheel, this package should be <30MB which leaves room for installing other python libraries like Pillow for image processing. To this package, you’ll also zip a python script that will contain your lambda function. Here is an example script that downloads an image classification model from an S3 bucket, listens to a Kinesis stream of images, runs inference and prints out predicted labels. After following AWS’s instructions on deploying your lambda package, you can see the predicted labels for your Kinesis image stream in your CloudWatch logs.

Serverless Vision Inference using AWS Lambda and TFLite

Written by Salma Mayorquin