Deploying a Machine Learning Model to a Serverless Backend with SAM CLI

Published in

Carnegie Mellon Robotics Academy

10 min readAug 10, 2020

Introduction

Over the last year at CMRA, we have been incorporating more machine learning models into our applications. There are plenty of data science blogs about developing models. However, there are very few tutorials about deploying models into production. Fortunately, there are options available for this and it continues to get easier.

Popular options for deploying models developed in Python include using services written in Flask or Django for client applications to interface with. AWS, Azure, and Google Cloud each offer their own brand of Machine Learning as a Service (MLaaS) products. These services offer convenient workflows, cloud based notebooks, and marketplaces for models. As nice as both options are there are drawbacks when it comes to costs and scaling.

We chose to create a Serverless application using the popular Lambda service from AWS. A Serverless Architecture offers several compelling advantages over the other options:

Out of the box integration with AWS Services like S3 and DynoDB (to name a few).
Automated deployments with SAM CLI.
Pay as you go model.
Automatic scaling.

Serverless Vs. Traditional Cloud Computing Solution

Prerequisites

You will need to have the following technologies installed to follow along with this tutorial:

Python 3.8.3 64 Bit (Preferably with Anaconda)
A free AWS Account
SAM CLI by following the instructions here

Important note — Make sure that you config your AWS CLI with the IAM account that you created as part of the SAM CLI instructions. You can configure AWS CLI by running $ aws configure and filling in all of the prompts for your IAM credentials.

Background

Earlier this year, CMRA was working on developing several virtual activities designed to teach learners how ML technologies will work in an Advanced Robotics Manufacturing context.

One such activity was a game where you are in charge of a factory with machines that periodically breakdown. Predicting when the machines breakdown and assigning a technician to fix the machine results in lost money.

Fortunately there is a dataset available which contains machine sensor data. The sensor values have a simple linear relationship to the state of the machine — breaking or working. Predicting the breaking state allows for the player to put a technician on a machine before the machine goes offline and requires even more time to repair.

The dataset pictured below is pretty simple. Each instance of the data sample has a labeled state and each individual feature has a linear relationship to the state. The other features of the dataset include temperature, vibration, current, and noise. Noise, no pun intended, has no linear relationship to the working or breaking state of machines.

[
  {
    "state": "Breaking",
    "temp": "88",
    "vibration": "79",
    "current": "12",
    "noise": "84"
  },
  {
    "state": "Working",
    "temp": "27",
    "vibration": "47",
    "current": "59",
    "noise": "48"
  },
  
  ...  {
    "state": "Working",
    "temp": "73",
    "vibration": "11",
    "current": "84",
    "noise": "29"
  }
]

Deploying a Trained Model

We won’t spend a lot of time on this topic since the primary focus of this article is to deploy a model.

Pictured below is the script that trains on the machine factory dataset located here. A simple python script will load the data, process the data, train a model, and save a pickled version of that model. I recommend running the training script with Python 3.8 in a Jupyter Notebook. The linked repository at the end of this article includes the script in notebook form. I also recommend using VS Code’s wonderful Python package to take advantage of builtin notebooks with linting and VS Code keybindings.

Factory Data Training Script

This script converts the JSON data into a Pandas DataFrame object and normalizes the features and encodes the 2 states into 0 and 1.

The model is built using Scikit Learn’s LogisticRegression model with the ‘liblinear’ default solver. The accuracy comes in at an impressive 97%! As we previously said, the machine states are pretty predictable and probably wouldn’t require an ML model in the real world.

The model and the encoding ( a mapping of the state labels to the encoded values) are pickled and saved to a local file. The resulting pickled file can be set aside for now. This is the last time we will use the above training script. Assume that this was work completed by your talented data scientist.

Deploying the Model with SAM CLI

Meet SAM

Sam

Now that we have SAM CLI installed, let’s take it for a quick test drive. We will start by initializing our application:

$ sam initWhich template source would you like to use?
 1 - AWS Quick Start Templates
 2 - Custom Template Location
Choice: 1Which runtime would you like to use?
 1 - nodejs12.x
 2 - python3.8
 ...
Runtime: 2Project name [sam-app]: your_app_nameCloning app templates from https://github.com/awslabs/aws-sam-cli-app-templates.gitAWS quick start application templates:
 1 - Hello World Example
 ...
Template selection: 1

This will be our actual app, but to start out we will choose to checkout the quick start template, install Python 3.8, name our project, and use the ‘Hello World Example’.

We will start by invoking the HelloWorld Lambda function. The function can be invoked by simply running $ sam local invoke in the project’s root directory. If everything is wired up properly, SAM CLI should run a simulated Docker container of the AWS Lambda environment and return a 200 response with the message ‘Hello World’.

Opening up HelloWorld/app.py reveals the Lambda function. All Lambda functions have an event and context parameter. The event parameter is usually a Python dict with data about the event that triggered the function. This can be data from an API request or from a direct invocation coming from any event that is configured to trigger the Lambda function (e.g. S3 file upload). The context parameter will go unused for our application. This parameter returns a Context object that provides data about the environment that the function is running in.

The HelloWorld function uses neither parameter. Instead, it returns the requisite response all Lambda functions are required to return when invoked by API Gateway. This is a JSON response with an HTTP status code and a message body.

def lambda_handler(event, context):   return {
        "statusCode": 200,
        "body": json.dumps({
            "message": "hello world",
            # "location": ip.text.replace("\n", "")
        }),
        }

Creating Our Own Lambda Function

We are ready to write our own function now that we have the basic feel for how to invoke Lambda functions locally. Let’s rename the folder containing the HelloWorldFunction ‘model_inference’ and replace the app.py code with the following snippet:

Model Inference app.py

There’s a lot to decompose here. First of all we want to take the pickled model that we uploaded to S3 and load it into memory. Lambda provides a convenient Python library called Boto3 that can directly interface with any service that the Lambda function is authorized to work with. We will setup the Lambda’s policy but for right now let’s assume that the load pickle file will return the pickled model and encoding map when given the S3 Bucket and key (file name).

The load_pickle function instantiates an S3 client and receives a download_file method which saves the pickle to a specified local path.

Once the model is loaded, the event is processed and the JSON request payload data is converted to the Python dict format. The rest of the code mirrors the training script. The features are extracted by name, normalized, and then fed into the model’s predict method. The prediction is mapped back to the state of ‘working’ or ‘breaking’ and is passed to the JSON response message.

Configuring the Template

Getting this all to work requires a few more steps. SAM CLI projects include a template.yaml file at the root of the project. This template has 3 major sections — Globals, Resources, and Outputs.

The Globals section holds all of the global settings that apply to all Lambda functions in your project. The warmup request takes a little bit of time. To be on the safe side we increased the Timeout for all functions to be 60 seconds.

Resources specifies each function in the project. Every function has a type. The ModelInferenceFunction has a type ofAWS::Serverless::Function. Properties about the function are set below each resource. The CodeUri matches the function’s parent directory name (model_inference). Policies for the Lambda function are set under Policies. We would like this function to read from S3 and download the pickled file that we uploaded. Therefore, we gave this function an S3ReadPolicy and specified the bucket name. Other policies can be added to allow access to other AWS services as needed.

All functions are triggered by an event. In our case the API Gateway service will be the service that triggers Lambda invocations. That is to say that once deployed the API Gateway service will give us an endpoint that we can send HTTP requests to and trigger our Lambda function. We also specify the API’s properties which include the the path name and the HTTP method.

The outputs specify the values for your application’s components once the stack is deployed. Calling the aws cloudformation describe-stacks command will return the API endpoint, Lambda ARN, and Lambda IAM Role ARN. SAM CLI applications are deployed using the CloudFormation service which acts as an orchestration tool for spinning up and connecting multiple AWS services that work together.

Installing Dependencies

If you run the updated function by calling sam invoke local you will be in for a disappointing surprise. None of the included dependencies are available. Fortunately, installing dependencies for a SAM CLI application is pretty straight forward.

We need to create a requirements.txt file in the root directory for all of the requisite dependencies that will be used by our function. In our case we just need to install Pandas and Sklearn.

It works best if we create a pristine environment and freeze only the dependencies that we need for the project in the root directory. We can create an environment in python by running python3 -m venv env. This will create an env/ folder that will hold all of the data for our virtual environment. We can activate the environment by running source env/bin/activate.

You can tell when you have the environment activated when the environment name (env) appears to the left of the command line. We can now install our dependencies and‘freeze’ dependencies into a requirements.txt file.

(env) $ pip install sklearn pandas
(env) $ pip freeze > requirements.txt

After we are done, we can exit the environment by simply calling the deactivate command.

Now we can build our lambda function with the included dependencies by including the requirements.txt manifest. This will create a build directory under .aws-sam/ that will be be packaged and sent to AWS when we are ready to deploy.

$ sam build -m requirements.txt

Invoking Custom Events

We still need to pass inference data to our Lambda function. We can do this by creating a custom event called single_inference.json and saving it within the events folder. This json file will be passed to the Lambda function upon invocation by calling sam local invoke -e events/single_Infernce.json.

{
  "data": {
    "temp": "10",
    "vibration": "1.0",
    "current": "0",
    "noise": "78"
  }
}

Testing the API Endpoint Locally

Sam also offers a convenient local server that will allow us to perform a full integration test of our API endpoint and Lambda function. We can start the server at localhost:3000 by calling sam local start-api.

When testing the API endpoint, you will notice that the event data looks slightly different than a direct invocation. This is why we process all incoming events with the parse_event function. API Invocations are in JSON. The ‘body’ and ‘data’ of the request needs to be extracted from the JSON. A direct invocation only requires us to select the ‘data’ key of a Python dict. Other services that can be used to trigger functions likely have different types or shapes to the event data. It’s best to log the event and become familiar with what a particular service’s event looks like.

def parse_event(event):
  if 'body' in event.keys():
    return json.loads(event['body'])['data']
  else:
    return event['data']

The local endpoint can now be called with a simple curl request that includes the factory data in JSON format.

$ curl --request POST \
  --url https://localhost:3000/inferences \
  --header 'content-type: application/json' \
  --data '{
  "data": {
    "temp": "1",
    "vibration": "1.0",
    "current": "88",
    "noise": "23"
  }
}=> {"prediction": "Working"}

Deploy to AWS

We are now ready to deploy our little ML Serverless application to AWS! Deploying an application couldn’t be simpler. Simply call sam deploy --guided and follow the prompts to deploy to your AWS account.

Once the deploy completes, you can sign into AWS console and see the creation of your Lambda function, API Gateway Endpoint, and your CloudFormation stack. At this point we can also test our production endpoint by calling our endpoint at the given endpoint url. The invocation URL can be found at API Gateway -> API -> Stages -> Prod. On this screen you will see the ‘invocation url’ and you can test the endpoint by sending another CURL request.

$ curl --request POST \
  --url https://<your-endpoint-url>/Prod/inferences \
  --header 'content-type: application/json' \
  --data '{
  "data": {
    "temp": "1",
    "vibration": "1.0",
    "current": "88",
    "noise": "23"
  }
}=> {"prediction": "Working"}

Conclusion

Congratulations! We have successfully deployed a simple logistical regression model that can now be invoked on demand without running a server.

Sam is Impressed

In review, we used SAM CLI to quickly set up an AWS-connected project, then transplanted code from the provided CMRA template into it, so that it would load a saved (pickled) model and execute it. We then configured the YAML config file to point to the right functions, and installed the necessary dependencies. We set up single instance inference testing, tested it locally, and finally deployed it to AWS.

We can add more lambda functions that handle other tasks. For instance, it wouldn’t take a lot of effort to create a TrainingFunction that is invoked each time a new dataset is uploaded to a particular S3 bucket. The training set can be loaded within the function and the model can be saved for the InferenceFunction to use.

I hope this article demystifies what it takes to deploy an ML model to a serverless backend. Do not hesitate to respond in the comments with any feedback or questions.

This material is based upon work supported by the National Science Foundation under Grant Number 1937063.

Github Repository