Deploy a custom Machine Learning Model with AWS Sagemaker
Sagemaker is a fully managed machine learning service,which provides you support to build models using built-in-algorithms, with native support for bring-your-own-algorithms and ML frameworks such as Apache MXNet, PyTorch, SparkML, Tensorflow and Scikit-Learn.
In this post, I’ll walk through how to deploy a machine learning model which is built and trained locally using custom algorithm, as a REST API using Sagemaker, lambda and docker.
I’ll break the process in to 5 steps
- Step 1: Building the model and saving the artifacts.
- Step 2: Defining the server and Inference code.
- Step 3: Build a Sagemaker Container.
- Step 4: Creating Model, Endpoint Configuration, and Endpoint.
- Step 5: Invoking the model using a lambda with API Gateway trigger.
Step 1: Building the model and saving the artifacts.
Build the model and serialize the object, which is used for prediction. In this post, I’m using simple Linear Regression, one independent variable.
Once you serialize the python object to pickle file, save that artifact(pickle file) in tar.gz format and upload in a S3 bucket.
Step 2: Defining the server and inference code.
When an endpoint is invoked Sagemaker interacts with the Docker container, which runs the inference code for hosting services and processes the request and returns the response.Containers need to implement a web server that responds to /invocations and /ping on port 8080.
Inference code in the container will receive GET requests from the infrastructure and it should respond to Sagemaker with an HTTP 200 status code and an empty body, which indicates that the container is ready to accept inference requests at invocations endpoint.
And invocations is the endpoint that receives POST requests and responds according to the format specified in the algorithm
To make the model REST API, you need Flask, which is WSGI(Web Server Gateway Interface) application framework, Gunicorn the WSGI server, and nginx the reverse-proxy and load balancer.
Code : https://github.com/NareshReddyy/Sagemaker_deploy_own_model.git
Step 3: Sagemaker Container.
Sagemaker uses docker containers extensively. You can put your scripts, algorithms, and inference code for your models in the containers, which includes the runtime, system tools, libraries and other code to deploy your models, which provides flexibility to run your own model.
You create Docker containers from images that are saved in a repository. You build the images from scripted instructions provided in a Dockerfile.
The Dockerfile describes the image that you want to build with complete operating system installation of the system that you want to run.Use standard Ubuntu installation as base image and run the normal tools to install the things needed by your inference code. And you need to
copy the folder(Linear_regx) where you have the nginx.conf, predictor.py, serve and wsgi.py to /opt/code and make it as a working directory.
The Amazon Sagemaker Containers library places the scripts that the container will run in the /opt/ml/code/ directory
To build a local image, use the following command.
docker build <image-name>
Create a repository in AWS ECR and tag the local image to that repository.
The repository has the following structure :
<account number>.dkr.ecr.<region>.amazonaws.com/<image name>:<tag>
docker tag <image-name> <repository-name>:<image-tag>
Before pushing the repository, you need to configure your AWS CLI and login
Once you execute the above command you will see something like
docker login -u AWS -p xxxxx , use this command to login to ECR.
docker push <repository name>:<image tag>
Step 4: Creating Model, Endpoint Configuration, and Endpoint.
Creating models can be done by API or AWS management console . Provide Model name and IAM role.
In Container definition, choose to provide artifacts and inference image location and provide the S3 location of the artifacts and Image URI.
After creating the model, create Endpoint Configuration and add the model which has been created.
When you have multiple models to host, instead of creating multiple endpoints, you can choose Multiple models option to host multiple models under a single endpoint, which is cost effective in hosting multiple models.
You can change the instance type and instance count and enable Elastic Interface(EI) based on your requirement.
And you can enable data capture which saves prediction request and response in S3 bucket which provides the option to set alerts for when there are deviations in the model quality, such as data drift.
Create Endpoint using the existing configuration
Step 5: Invoking the model using a lambda with API Gateway trigger.
Create a Lambda with API Gateway trigger.
And in API Gateway trigger configuration, add an REST API to your Lambda function to create an HTTP endpoint that invokes the Sagemaker Endpoint.
And in the function code, read the request received from API gateway and pass the input to the invoke_endpoint and capture and return response back to the API gateway.
Now open the API gateway and you can see the API created by the lambda function, and create the method required(POST) and integrate lambda function. And test by providing the input in the request body and check the output.
You can test your endpoint either by using Sagemaker notebooks or lambda.