Serverless TensorFlow Model in AWS — Deploying a Trained TensorFlow Model to AWS Lambda — pt 2

Adam Burek
7 min readOct 8, 2023

--

Written by: Adam Burek and Gustavo Jurado

While developing our machine learning workflow, we were faced with the challenge of how to host our trained model. Our workload required multiple predictions to be generated for less than a 1 minute each day. However, the out-of-the-box solutions only offered model hosting that would run 24/7.

This article is part of a two part series:

  • In the first article, we discussed the steps we took to train, optimize, and package our model to be uploaded to S3. You can click this link to jump to the first article in this series.
  • In the second article, we will use the SAM framework to deploy a lambda function that will retrieve our trained model.

Architectural Overview

Architecture for Hosting TensorFlow Model in AWS

The model prediction can be encapsulated in a single lambda function that receives all the necessary features in JSON format. These features are normalized to be consistent with the training data set. Then the zipped model which has been exported and uploaded to S3 is retrieved. As the model name is not hard coded and is instead set as an environment parameter for the lambda function, the chosen model can be easily swapped out.

We chose this design because the model prediction would only be needed for a very short period of time, in our case 1 minute per day. Hosting a container model in Sage Maker with it running 24/7 would not be cost effective. Another benefit of not hard coding the model name was that we could easily change the prediction model without having to redeploy the entire container, which is a fairly slow process.

Extending this approach, as long as all the models use the same transformation operations for our features, we can quickly and easily deploy multiple versions using the SAM framework and then aggregate the predictions. This strategy that can be used to reduce the risk of over training a model when we have trained multiple models with slightly different training data.

Components of the SAM Framework

Using a serverless framework like AWS SAM makes it much easier to work with lambda functions that import a lot of Python packages and require additional configuration of the lambda function.

You will need to create the samconfig.toml and template.yaml files in your project directory. You should also create a subdirectory to store your lambda function. The lambda subdirectory should contain the following files:

  1. python code file which in this case we named app.py
  2. __init__.py,
  3. Dockerfile
  4. requirements.txt

samconfig.toml

The samconfig.toml file is created when you use the ‘sam init’ command and this is where the deployment parameters are stored. I find it useful to create two environments, which in this case I named ‘dev’ and ‘prod’. The main change is the image_repositories. I also added the “StagingEnv” environment variable, this environment variable will control the logging level of my lambda functions. Lastly, I included the staging environment in the stack name to avoid naming conflicts between staging environments.

version = 0.1
[dev]
[dev.deploy]
[dev.deploy.parameters]
s3_bucket = "aws-sam-cli-titanic"
s3_prefix = "titanic-prediction-dev"
region = "eu-central-1"
confirm_changeset = true
capabilities = "CAPABILITY_IAM"
stack_name = "titanic-prediction-dev"
image_repositories = ["TitanicSurvivorEngine=092583465186.dkr.ecr.eu-central-1.amazonaws.com/titanicpredictiondev421cbe24/titanicsurvivorengine60a3beferepo"]
StagingEnv = "dev" # Sets the logging level to debug in the lambda function

[prod]
[prod.deploy]
[prod.deploy.parameters]
s3_bucket = "aws-sam-cli-managed-default-samclisourcebucket-wd2xoprgtnee"
s3_prefix = "titanic-prediction-prod"
region = "ca-central-1"
confirm_changeset = true
capabilities = "CAPABILITY_IAM"
stack_name = "titanic-prediction-prod"
image_repositories = ["DockerEngine=092583465186.dkr.ecr.ca-central-1.amazonaws.com/titanicpredictionprod/prod"]
StagingEnv = "prod" # Sets the logging level to info in the lambda function

template.yaml

In this file we define the lambda function along with its configuration. There are two ways to deploy a lambda function; zip and docker container. At the time of writing, the zip file is too large after importing all the necessary libraries, so we need to deploy a container lambda function.

Resources:
TitanicSurvivorEngine:
Type: AWS::Serverless::Function
Properties:
FunctionName: !Sub "${AWS::StackName}-model-engine"
PackageType: Image
Architectures:
- x86_64
MemorySize: 1000
Policies:
- AWSLambdaExecute
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:GetObject
Resource:
- arn:aws:s3:::titanic-models
Environment:
Variables:
ModelBucket: titanic-models
ModelName: titanic_model-Mar30.zip
Metadata:
Dockerfile: Dockerfile
DockerContext: ./model_prediction
DockerTag: v1

To define the lambda function as a Docker container, we need to add “PackageType: Image”. We also need to set the metadata items such as defining our container root path using the “DockerContext” along with specifying the “Dockerfile” name.

The memory size is workload dependent; in our case we set it to 1,000 MB. Finally, the lambda function needs the appropriate permissions to retrieve the model stored in S3. Here we grant the lambda function the permission to retrieve objects in the S3 bucket where our models are stored.

Since we want to deploy our application using code, we also define the S3 bucket where our models are stored, along with the name of our model object. Using this method, we can still manage our application using the SAM framework, but by changing just these parameters, our deployments will be significantly faster compared to uploading a new Docker container image each time we want to change the model we are using.

Diving into the Lambda Function

The model prediction function is where our core functionality comes to life. This function performs the following actions in sequence: it loads all configuration variables, retrieves the desired trained model, transforms the features, generates the prediction and returns the result.

temp_zip = '/tmp/model.zip'
model_contents = '/tmp/model_contents'

bucket = os.getenv('ModelBucket')
model_name = os.getenv('ModelName')
staging = os.getenv('StagingEnv')

Our model is stored as a zip file to simplify the model retrieval process, consequently we declare the temporary model location as hard-coded variables, since this will not change. The settings that tend to change more frequently are stored as environment variables so they can be easily edited in the console without having to re-upload a new container image.

log = logging.getLogger()
if staging == 'dev':
log.setLevel(logging.DEBUG)
tf.compat.v1.logging.set_verbosity(20)
elif staging == 'prod' or staging == 'preprod':
log.setLevel(logging.INFO)
tf.compat.v1.logging.set_verbosity(40)
else:
raise Exception("Unknown staging env")

Next, we define the logging configuration according to the staging environment. In the development environment detailed logs are required for debugging. However, in the production environment such detailed logs are not needed and due to the volume of invocations reducing the operational logs helps minimize log retention costs. Therefore, setting different logging levels between the two staging environments is useful.

s3_client = boto3.client('s3', use_ssl=False) # create object for s3 service


# download the zipped model from s3 and save it to a temporary location
# the /tmp directory location is ephemeral and only exists during the
# invocation of the function
zip_obj = s3_client.download_file(Bucket=bucket, Key=model_name, Filename=temp_zip)
log.debug(os.listdir('/tmp'))


# unzip the model
with zipfile.ZipFile(temp_zip, 'r') as zip_ref:
zip_ref.extractall('/tmp/model_contents')


# create a string to store the location of the saved model
model_folder = model_contents + '/' + os.listdir(model_contents)[0]
log.debug(f"Model folder = {os.listdir(model_folder)}")

To allow lambda to interact with other AWS services, we will use the boto3 Python library. In our case, we want to interact with the S3 service, so we define our S3 client object “s3_client = boto3.client(‘s3’, use_ssl=False)” and use the “download_file” to retrieve the desired zipped model. Next we unzip the model using the Python zipfile library.

# transform Sex
if event["Sex"] == "male":
event["Sex"] = 0
elif event["Sex"] == "female":
event["Sex"] = 1
else:
return(f"ERROR: input for Sex with value {event['Sex']} is invalid")

During training we encoded two of our features, ‘Sex’ and ‘Embarked’. For the “Sex” feature we expect only two possible values, but we shouldn’t trust this assumption completely. Therefore, we should always add an else statement to safely terminate the lambda function and print an error message if an unexpected value is encountered.

model = tf.keras.models.load_model(model_folder)
# this is also important to ensure the order of the features is consistent
# when being fed to the model
feature_keys = [
"Pclass",
"Sex",
"Age",
"SibSp",
"Parch",
"Fair",
"Embarked_C",
"Embarked_Q",
"Embarked_S"
]


# we are extracting just the feature values and transforming them into the proper format
feature_values_list = []
for key in feature_keys:
feature_values_list.append(event[key])
feature_values_for_predication = np.expand_dims(feature_values_list, axis=0)

Once the features are prepared, we create our trained model object. We need to do one more step before we can send our transformed features to our model. Our features are in a Python dictionary list which is easy for us to work with, but our model requires the np.array type. To ensure consistent feature order, we iterate over a hard coded list of feature names to get the values and wrap our newly created feature list with an np.array.

model_prediction = {
"input_features": event,
"prediction": float(model.predict(x=feature_values_for_predication, verbose=0)[0])
}

return json.dumps(model_prediction)

We got a bit fancy with our prediction statement, so let’s break it down. In the prediction key, we generate a prediction using the “model.predict” method. The resulting prediction is in a list, so we grab the first and only value in the list returned by the method. Finally, we change the type to float. We then return the originally passed features and the resulting prediction back to the component that called the lambda function in the JSON format.

Conclusion

As demonstrated, our solution is able to offer a cost efficient method for hosting a TensorFlow model on AWS that is able to be invoked infrequently while still providing millisecond latency in generating predictions.

To read the first article in this series click this link.

Code Repository: https://github.com/SouthernYoda/Serverless-TensorFlow-Model-in-AWS

--

--