MLOps with SageMaker — Part III

Inference 🤖

Published in

MantisNLP

8 min readMay 3, 2023

In the first two parts (1, 2) of our MLOps with SageMaker series, we show how you can use SageMaker to run your local training script in the cloud. We went from how to train a model using SageMaker’s preconfigured containers to using a fully custom environment of our own design. In this post, we will cover how to deploy our models using SageMaker as well as how to use them for inference.

As a reminder, this is the project structure we have been using in the previous posts

The models we will be deploying should live inside the models folder. We are using a scripts folder to store all our code that interacts with SageMaker. The idea is that you can take some of those scripts and use them in your project with minimal changes. All other code is located under the src directory.

We will be working with an sklearn model but all examples can be easily modified to work with any framework. We assume we have a trained model.pkl model inside the models folder: this is model that we will deploy. Here is a predict function, modified so that it works with SageMaker.

https://github.com/MantisAI/sagemaker_examples/blob/main/src/predict_sklearn.py

We can run this locally as follows: python src/predict_sklearn.py “hi” — model-path models/. In order for our prediction function to work with SageMaker we need to provide a function called model_fn that loads the model.

In cases where our data is not a list or numpy array, we also need to define a function that converts the data into one of these objects which our sklearn model expects. This function needs to be named input_fn and it should contain a content_type argument that should align with the content type of the POST request we will send to our endpoint later.

In most cases using SageMaker only requires having a model_fn function in your predict script that loads your model. When that’s not enough, input_fn is usually the only other thing we need to define in order to convert the request data to the format sklearn expects. Now that our prediction function is ready, let’s look at the script to deploy our model.

https://github.com/MantisAI/sagemaker_examples/blob/main/scripts/deploy_sklearn_sagemaker.py

This script will create a REST API endpoint that is ready to accept inference requests, pass them to our model and return predictions. The only thing we need to define is the path to our prediction function through the entry_point argument, the version of sklearn we are using through the framework_version argument and the path to our model. As always, we need a sagemaker role to interact with SageMaker (instructions to setup up the role).

We can run this script locally by passing the path to our models folder: python scripts/deploy_sklearn_sagemaker.py file://models. Note that we need to prepend the model path with file:// or s3:// depending on whether the path is local or in s3.

We can now send requests to our local API using curl for example

Notice that the content type is the same as the argument content_type in our prediction function. This ensures SageMaker will use the right function to prepare our data for the model.

SageMaker uses an invocations endpoint for inference and it accepts only POST requests. Deploying in the cloud is as simple as changing the model_path to an s3 path and choosing a type for the instance which will host the inference API (list of available instances).

In order to query the cloud endpoint we need to use authentication unless we take extra steps to make the API available using a public IP (guide). The easiest way to interact with our API is using the sagemaker or boto3 library that handles authentication for us. Here is how:

https://github.com/MantisAI/sagemaker_examples/blob/main/scripts/predict_sagemaker.py

The only information we need for our endpoint is its unique name which is why we print the endpoint name in our deploying script. All that remains is to define how we want to serialize the data we will send to our endpoint. In our case, we are sending our data as JSON but there are serializers for all common formats like numpy, csv etc. You can find here a list of all the available serializers.

After we are done with our endpoint, we can delete it if we do not plan to use it immediately. Here is a small script to do that.

https://github.com/MantisAI/sagemaker_examples/blob/main/scripts/delete_endpoint.py

All it took to deploy our sklearn model was

to modify our predict script to include a function to load our model and read the request data
a deploy script to create an endpoint using our trained model

We also created two helper scripts to call our endpoint, and later delete it. Note how we did not have to explicitly create the API or define the environment that will run our code. In fact for most common frameworks like tensorflow, PyTorch, transformers, sklearn and more, we can rely on preconfigured classes and containers that do most of the heavy lifting. Here is a list of all of the supported frameworks.

There are some cases where we want a bit more control over our environment or the API we are creating but we would still like to use SageMaker. Let’s see how we can do that next. We can start by creating a custom API using FastAPI that works with SageMaker.

https://github.com/MantisAI/sagemaker_examples/blob/main/src/sklearn_api.py

In order for the API to work with SageMaker we need to define two endpoints. The ping endpoint is used to check that our API is “alive” and simply needs to accept a GET request and return a 200 status code, which is the default. The invocations endpoint is the one responsible for inference, as we saw before and it needs to accept a POST request.

The data from the POST request is passed as a parameter. Note that we are using pydantic to deserialize and validate that data. We also load the model from the models folder. The location of that folder inside the container is stored in an environment variable SM_MODEL_DIR. Our predict function takes care of both loading the model and deserializing the data which previously we had to do by defining the functions model_fn and input_fn.

Now that we have taken care of the API, let’s also customize the container that is running the inference.

https://github.com/MantisAI/sagemaker_examples/blob/main/Dockerfile.custom-sklearn-inference

Our container needs to install all the libraries needed to run our API and inference, in this case: sklearn, fastapi, and uvicorn. We also need to copy our API script into the /usr/bin/serve folder, and make it executable. usr/bin/serve is the file that SageMaker will try to run when we deploy our model. Note that in our API script we included a shebang, in the first line, that instructed SageMaker to use python, which is essential for our API to work. Finally, we expose port 8080, the same port that SageMaker uses by default, and that is defined in our API script.

The line COPY src/ /opt/ml/code includes all our code which is helpful in cases where our API depends on other files. We also update the PYTHONPATH to make those files accessible to /usr/bin/serve as if they were in the same directory.

We can build our container with docker build -t sklearn-inference -f Dockerfile .

In order to deploy our model using our custom container we need to modify the deploy script slightly. Here is our new, more general, script.

https://github.com/MantisAI/sagemaker_examples/blob/main/scripts/deploy_sagemaker.py

The main difference is that we use the generic Model class, and we introduce the image_uri which will point to the local or ECR container (AWS hub for storing containers, similar to Dockerhub).

We can test the script locally first by running python scripts/deploy_sagemaker.py file://models sklearn-inference and then use curl (or another way method for sending HTTP calls), to send POST requests. When we are ready, we can switch to an s3 model path and an ECR container, which will create an endpoint in AWS using our custom container. You can test the endpoint using the same predict script we saw before (the one uses the Predictor class) and when you are done you can delete it also in the same way.

As a reminder, you can easily push your container to ECR in few commands

In order to create a custom environment for inference we only had to:

create a custom API and implement an invocations and ping endpoint,
create a custom container that contains our api script and installs the libraries that are necessary.

We used the same helper scripts for sending requests to the API and deleting our endpoint.

In this post we talked about how you can deploy your trained models behind an API using SageMaker and a few additional custom scripts. In practice, we often create predict functions, API scripts and Dockerfiles anyway so this post shows the modifications you need to make them compatible with SageMaker. Assuming you do that, deploying to SageMaker only requires one additional script.

In the next posts we want to speak about how to use SageMaker with frameworks such as Rasa and spaCy as well as how to tune hyperparameters using SageMaker AutoML and no code options. Stay tuned. All the code can be found in our SageMaker examples repository https://github.com/MantisAI/sagemaker_examples.

MLOps with SageMaker — Part III

Inference 🤖

Written by Nick Sorros