Create Your Own Docker Container for Model Serving in Sagemaker
In our previous post, we created our own custom Docker image for model training on AWS Sagemaker. To serve a custom model with our own code logic, we need to introduce some other settings and modules to our Docker image so that Sagemaker can use it to serve the model. Sagemaker mainly provides two main modes of model serving: (1)Batch Transform and (2) Hosting Service(such as Realtime Inference Endpoints). There are also Serverless Inference and Asynchronous Inference that are somehow in between those two main modes for serving specific use-cases. Batch Transform, as its name implies, is suitable for making prediction in bulk, large batches, when needed, without much focus on latency. Realtime Inference Endpoint, on the other hand, is more appropriate for serving real or near-real time models with all-time availability and low latency. In this post, we will cover how to prepare our codes and Docker container to serve a trained model on Sagemaker. In the next post, we will cover how to perform Batch Transform as well as how to deploy Realtime Inference Endpoint.
General Overflow of Interaction Between Sagemaker and Docker Container
Figure 1 shows the general overview of how Sagemaker will consume our Docker Container for model serving purpose. The figures shows only one EC2 instance but Sagemaker can spin up multiple of it if we configure it to do so.
Step 1:
We provide our own Docker Container to Sagemaker for it to use while setting up a special EC2 instance for model serving
Step 2:
It uses that Docker container to spin up an EC2 instance for serving the model
Step 3:
After spinning up the instance, it sends ping request with GET method to the instance to know whether everything is ok or not.
Step 4:
Sagemaker will wait for a response from the ping request to the instance, so the instance must return a special success message if everything is ok, otherwise it should return a failure message. If Sagemaker receives a success message, it will proceed to the Step 5, otherwise it will halt the instance and return a proper failure message.
Step 5:
Sagemaker will send invocation request to the “/invocations” endpoint of the instance with input data using POST method.
Step 6:
Sagemaker will wait for a response from the instance that includes proper predictions. In case of any error, the endpoint should return a proper error message to Sagemaker.
For Batch Transform option:
When all input data is processed and all the predictions are retrieved, Sagemaker will halt the instance, save the predictions to a specified S3 folder, then return a proper message.
For Realtime Inference Endpoint option:
Sagemaker will accepts invocation requests continuously with an all-time up inference instance.
As you may notice, our Docker container must serve endpoints /ping with GET and /invocations with POST method, both specifically on 8080 port. For batch transform option, it may additionally include execution-parameters/ endpoint with GET method to obtain some special parameters for batch transform. In other words, we must have a web server running into our Docker container which serves those endpoints. The simplest way to achieve it by using Flask, Gunicorn and Nginx. Let’s see how we can configure our container with those.
1 - Create One Docker Image For Both Training and Model Serving On Sagemaker
In our previous post, we just created our Docker container for model training in Sagemaker. Now, beside training purpose, we will see how to modify the same Docker file for also model serving purpose. So, we will have one Docker Container for both training and model serving. To continue with the next steps in this post, we must have the setup in our previous post where we created our Dockerfile, Pipfile and Pipfile.lock in the project root, and /src folder in the project root with train file in it:
├── Dockerfile
├── Pipfile
├── Pipfile.lock
└── src
└── train
1.1 — Installing Relevant Packages Using Pipenv
We need to install following packages to start a web server in the Docker container with Python. So we must go to the project folder’s root that we created in our previous post, and activate the virtual environment using:
# In the root of project folder where our Pipfile and Pipfile.lock resides
pipenv shell
Then in the activated virtual environment, we need to install the following packages by:
# Within activated virtual environment
pipenv install gunicorn gevent flask
It will install the packages, and update Pipfile and Pipfile.lock files accordingly.
A brief info about those packages:
Gevent is the use of simple, sequential programming in python to achieve scalability provided by asynchronous IO and lightweight multi-threading (as opposed to the callback-style of programming using Twisted’s Deferred).
Flask is a popular, extensible web microframework for building web applications with Python.
1.2 — Creating Necessary Files That Don’t Require Much Changes
For model serving, we need to add four extra files into our Docker container:
The explanation of those files from AWS documentation is as follows:
nginx.conf is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.
predictor.py is the program that actually implements the Flask web server and the prediction codes for this app. You’ll want to customize the actual prediction parts to your application. Since this algorithm is simple, we do all the processing here in this file, but you may choose to have separate files for implementing your custom logic.
serve is the program started when the container is started for hosting. It simply launches the gunicorn server which runs multiple instances of the Flask app defined in
predictor.py
. You should be able to take this file as-is.wsgi.py is a small wrapper used to invoke the Flask app. You should be able to take this file as-is.
All of the files except predictor.py are pretty standard that require no or very little changes:
- The file nginx.conf can be downloaded from HERE
- The file serve can be dowloaded from HERE.
- The file wsgi.py can be downloaded HERE
⚠️ In serve file, you may need to change the following code:
nginx = subprocess.Popen(['nginx', '-c', '/opt/program/nginx.conf'])
to:
nginx = subprocess.Popen(['nginx', '-c', '/opt/app/nginx.conf'])
because in our Dockerfile, we defined opt/app, not opt/program.
Now, we need to implement our own predictor.py code.
1.3 — Creating predictor.py File
This is the critical file in which we will define our endpoints and code logic for inference. This file must have certain implementations:
- /ping (required) endpoint with GET method to return a proper message to Sagemaker
- /invocations (required) endpoint with POST method to accept incoming payloads from Sagemaker and to return the predictions/outputs to Sagemaker
- /execution-parameters (only for batch transform, optional) — Allows the algorithm to provide the optimal tuning parameters for a job during runtime. Based on the memory and CPUs available for a container, the algorithm chooses the appropriate MaxConcurrentTransforms, BatchStrategy, and MaxPayloadInMB values for the job.
# This is the file that implements a flask server to do inferences. It's the file that you will modify to
# implement the scoring for your own algorithm.
from __future__ import print_function
import io
import json
import os
import pickle
import signal
import sys
import traceback
import flask
import pandas as pd
prefix = os.environ.get("ARTEFACT_PATH", "/opt/ml/")
model_path = os.path.join(prefix, "model")
FEATURES = os.environ.get("FEATURES", "")
FEATURES = FEATURES.split(",")
# A singleton for holding the model. This simply loads the model and holds it.
# It has a predict function that does a prediction based on the model and the input data.
class ScoringService(object):
model = None # Where we keep the model when it's loaded
@classmethod
def get_model(cls, model_path):
"""Get the model object for this instance, loading it if it's not already loaded."""
if cls.model == None:
with open(os.path.join(model_path, "model.pckl"), "rb") as inp:
cls.model = pickle.load(inp)
return cls.model
@classmethod
def predict(cls, data):
"""For the input, do the predictions and return them.
Args:
input (a pandas dataframe): The data on which to do the predictions. There will be
one prediction per row in the dataframe"""
clf = cls.get_model(model_path=model_path)
if hasattr(clf, "predict_proba"):
return clf.predict_proba(data)[:, 1]
if hasattr(clf, "predict"):
return clf.predict(data)
raise "Model does not have predict_proba or predict methods"
# The flask app for serving predictions
app = flask.Flask(__name__)
@app.route("/ping", methods=["GET"])
def ping():
"""Determine if the container is working and healthy. In this sample container, we declare
it healthy if we can load the model successfully."""
print(model_path)
health = ScoringService.get_model(model_path) is not None # You can insert a health check here
status = 200 if health else 404
return flask.Response(response="\n", status=status, mimetype="application/json")
@app.route("/invocations", methods=["POST"])
def transformation():
"""Do an inference on a single batch of data. In this sample server, we take data as CSV, convert
it to a pandas data frame for internal use and then convert the predictions back to CSV (which really
just means one prediction per line, since there's a single column.
"""
data = None
# Convert from CSV to pandas
if flask.request.content_type == "text/csv":
data = flask.request.data.decode("utf-8")
s = io.StringIO(data)
data = pd.read_csv(s, header=None)
data.columns = FEATURES
print('Columns:', data.columns)
print(data)
else:
return flask.Response(
response="This predictor only supports CSV data", status=415, mimetype="text/plain"
)
print("Invoked with {} records".format(data.shape[0]))
# Do the prediction
predictions = ScoringService.predict(data)
# Convert from numpy back to CSV
out = io.StringIO()
pd.DataFrame({"results": predictions}).to_csv(out, header=False, index=False)
result = out.getvalue()
return flask.Response(response=result, status=200, mimetype="text/csv")
We can change the inside of ping and transform functions as we wish as long as they handle the input properly and return a proper response that Sagemaker requires.
1.4 — Updating Our Dockerfile to install Nginx
The only addition to the Dockerfile in our previous post is adding nginx into install list and adding execution permission to serve:
FROM --platform=linux/x86-64 python:3.8
RUN apt-get -y update && apt-get install -y --no-install-recommends \
libusb-1.0-0-dev \
libudev-dev \
build-essential \
ca-certificates \
nginx && \ # --------------------> HERE IS THE CHANGE
rm -fr /var/lib/apt/lists/*
# Keep python from buffering the stdout - so the logs flushed quickly
ENV PYTHONUNBUFFERED=TRUE
# Don't compile bytecode
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/app:${PATH}"
ENV PYTHONPATH=.
RUN pip3 install pipenv==2022.7.4
# Install packages
WORKDIR /opt/app
COPY Pipfile Pipfile.lock ./
RUN pipenv install --deploy --system --dev
# Add src code
COPY src ./
RUN chmod +x train
RUN chmod +x serve # --------------------> HERE IS THE CHANGE
A brief info about nginx:
2-Testing The Docker Container Locally
We have made all the changes we need in our code base to train and serve the model using Docker container with Sagemaker. We now can test it locally before testing it on Sagemaker side actually.
2.1— Testing the Docker Container in Local For Training
First let’s create a docker-compose.yml file with the following command in the root folder:
---
version: "3.3"
services:
training:
build: .
container_name: byoc_training
command: train
volumes:
- ./ml_data:/opt/ml/
env_file:
- .env
In that docker-compose.yml file, we are mounting container’s /opt/ml/ path to ./ml_data folder in our local. So when we run the container using docker compose, it will mount everything under ./ml_data into container’s /opt/ml/ therefore any files/folders under ./ml_data will be accessible in the container’s opt/ml folder, and any changes under container’s /opt/ml/ path will be reflected to our local ./ml_data folder. So for this purpose, let’s create ml_data folder in the project root with proper folders/files in it:
├── Dockerfile
├── Pipfile
├── Pipfile.lock
├── build_and_push.sh
├── docker-compose.yml
├── ml_data
│ ├── input
│ │ ├── config
│ │ │ └── hyperparameters.json
│ │ └── data
│ │ └── train
│ │ └── master_df.csv
│ ├── model
│ └── output
└── src
├── nginx.conf
├── predictor.py
├── serve
├── train
└── wsgi.py
- Since we are reading hyperparameters in the train file under opt/ml/input/config/hyperparameters.json, we must define it under ml_data/input/config/hyperparameters.json.
- Since we are reading model data in the train file under opt/ml/input/data/train/master_df.csv, we must define it under ml_data/input/data/train/master_df.csv.
In our train file, we are reading TRAINING_FILE_NAME and FEATURES as environment variables. So when we run our Docker container for local training, we must pass those environment variables into the container. The easiest way to do it is to create a .env file with environment variables defined in it in the same folder with docker-compose.yml file, then pass that file to the container. The last two lines in the docker-compose.yml file does this. Our .env file looks like this:
TRAINING_FILE_NAME=master_df.csv
FEATURES="sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)"
ARTEFACT_PATH=/Users/mertatli/medium/ml_data #Change it to your local ml_data path
Then let’s create a Makefile file with the following commands in the root folder:
SHELL := /bin/bash
train:
docker-compose build training \
&& docker-compose run training
inference:
cd src/ \
&& pipenv run flask run # if a .env file is present, pipenv run will automatically load it
ping:
curl --location --request GET 'http://127.0.0.1:5000/ping'
Up to this point, we must have the following folder structure:
├── .env
├── Dockerfile
├── Makefile
├── Pipfile
├── Pipfile.lock
├── build_and_push.sh
├── docker-compose.yml
├── ml_data
│ ├── input
│ │ ├── config
│ │ │ └── hyperparameters.json
│ │ └── data
│ │ └── train
│ │ └── master_df.csv
│ ├── model
│ └── output
└── src
├── nginx.conf
├── predictor.py
├── serve
├── train
└── wsgi.py
Then open up a shell in the root folder, then run the following code:
make train
If it runs successfully, we should see feature_imporance.pckl and model.pckl under ml_data/model folder:
ml_data
├── input
│ ├── config
│ │ └── hyperparameters.json
│ └── data
│ └── train
│ └── master_df.csv
├── model
│ ├── feature_importance.pckl
│ └── model.pckl
└── output
Then congrats, we successfully trained our models and produced model artefacts. Next, we will run the container for model serving.
2.2 — Testing the Docker Container in Local For Model Serving
Open up a shell in the root folder, then run the following code:
make inference
It should be accepting some calls to endpoints:
Then open Postman application, and send a ping request with GET method to http://127.0.0.1:5000/ping:
Then open Postman application, and send a /invocations request with POST method to http://127.0.0.1:5000/invocations with text body ‘5.1,3.5,1.4,0.2’:
You may need to do the following changes because our API accepts ‘text/csv’ content_type:
if flask.request.content_type == "text/csv":
3-Conclusion
In this post, we have covered how to prepare our code for training and serving purposes in Sagemaker using Docker. In the next post, we will cover how to run Batch Transform jobs on Sagemaker using our Docker container, as well as how to deploy Realtime Inference Endpoint using the same Docker container. While serving the model on Sagemaker side, we may need to introduce some minor changes/settings in the predictor.py file as we will see shortly.