Deploy A Locally Trained ML Model In Cloud Using AWS SageMaker

A Beginner’s Guide With A Step-by-Step Hands-On Example.

Rajaram Suryanarayanan
Geek Culture
13 min readMay 26, 2021

--

Photo by Debby Hudson on Unsplash

In this article, I will share an example on how we can deploy a locally trained Machine Learning model in cloud using AWS SageMaker service. By “locally trained” , I mean a ML model which is trained locally in our laptop ( i.e. outside AWS cloud ). I will take you through the various steps starting from training a model, to the deployment of the model in the AWS cloud and invoking the deployment from a local client to get predictions.

Introduction

If we google for the ways to deploy a ML model in AWS, we will find quite a few videos and articles which talk about deployment of ML model on Amazon EC2 instance. They talk of launching a AWS EC2 instance to host our own Flask App and ML model; where a client (browser) would be used to send the test data to the flask server hosted on EC2, which in turn invokes the model hosted on the same EC2 to get the prediction and then sends the prediction result to the client.

But this approach is just another generic use-case of using a cloud VM ( EC2) to launch an application, and does not have anything much specific to do with ML model deployment. It does not use any fully managed / server-less facilities and benefits like scalability on demand (Auto Scaling ) that AWS SageMaker endpoints can provide for ML inference. Moreover, EC2 instance can also can end up costly both in terms of resource usage charges and management work involved.

AWS SageMaker provides more elegant ways to train, test and deploy models with tools like Inference pipelines, Batch transform, multi model endpoints, A/B testing with production variants, Hyper-parameter tuning, Auto scaling etc.

Few Fundamentals

Before going to the deployment example to be discussed shortly, I will try to give some basic details on the AWS services used, so that readers who are completely new to AWS can do some further basic study on AWS themselves if needed, prior to trying the steps and services discussed in this article.

AWS SageMaker

Amazon SageMaker is a fully managed machine learning service. It helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis.

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. We can use S3 to store any files, models, input data for training models, etc.

Lambda

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers. Just upload your code as a ZIP file or container image, and Lambda automatically and precisely allocates compute execution power and runs your code based on the incoming request or event, for any scale of traffic.

AWS API Gateway

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the “front door” for applications to access data, business logic, or functionality from your backend services. Using API Gateway, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications.

Boto3

The AWS SDK for Python. You use the AWS SDK for Python (Boto3) to create, configure, and manage AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). The SDK provides an object-oriented API as well as low-level access to AWS services.

Amazon SageMaker Python SDK

SageMaker Python SDK provides several high-level abstractions for working with Amazon SageMaker.

Deployment of Models in SageMaker

In short, SageMaker is a ML service provided by AWS for coding, training and deploying ML models on the cloud. If we go through the SageMaker developer guide we can understand how vast and varied the service framework is, including built-in support for a number of popular algorithms ranging from Linear Learner, XGBoost, K-NN, K-Means etc. to Deep Learning frameworks like Tensorflow, MXNet etc.

Bringing back the focus to our current topic of model deployment on SageMaker, it is to be noted that there are various ways to train and deploy ML models in SageMaker-

  • Training and deploying inside SageMaker , both using SageMaker’s own built-in algorithm containers (pls note these are AWS managed containers).
  • Training our model locally/outside SageMaker and then use SageMaker’s built-in algorithm container to just deploy the locally trained model (Bring Your Own Model type ).
  • Use SageMaker’s (AWS managed) built-in algorithms containers, but customize the training as per needs with our own scripts ( Bring Your Own Model type).
  • Train our model in whatever method / or our own algorithms as we want locally in our container (built and managed by us ) and then bring that container to SageMaker and deploy it for usage (BYOC- Bring Your Own Container).

As we go from the top to the bottom of the above list, the flexibility of a ML engineer increases, but the complexity and work efforts needed increases, as for example, you need to put more efforts for BYOC type of deployment.

Deploying a locally trained Model on AWS cloud

In this article, we are going to try out the second method in the list above. I was particularly interested in learning how to take a model I train locally on my laptop, and deploy it on AWS cloud ( without having to worry much about how to use SageMaker for training a model ).

Let’s start getting in to the exercise !

I have taken a well-beaten & simple, yet a popular classification model example in the ML world- the prediction of Iris flowers type..
As our focus here is only about deployment, we are not going to spend time on the dataset EDA, data preparation, hyper-parameters tuning, model selection etc. etc. Hence we are taking the simple Iris flower data set !

Pre-requisite for trying out the exercise -

One needs to have a free tier AWS account to avail Amazon’s cloud services including SageMaker, Lambda, S3 etc. and some familiarity with launching these services on AWS console. Details on how to create a free tier AWS account and try launching these services on AWS console, can be found easily on the net.

Important — Please note that SageMaker is NOT a free service and does incur some cost based on how long you run and use the resources like notebook instances. You have to properly cleanup all SageMaker resources, Lambda, S3 buckets etc after you try this deployment exercise. I have shared the details on how to cleanup at the end of this article.

Our step-by-step approach to the exercise will be as below, in brief.

  1. Load the dataset and train a model in local laptop without using any cloud library or SageMaker.

2. Upload the trained model file to AWS SageMaker and deploy there. Deployment includes placing the model in S3 bucket, creating a SageMaker model object, configuring and creating the endpoints, and a few server-less services (API gateway and Lambda ) to trigger the endpoint from outside world.

3. Use a local client ( We use Postman ) to send a sample test data to the deployed model in the cloud and get the prediction back to the client. Restful htttp methods come to our help on this.

Let us go through the detailed step-by-step exercise below.

Training the Model

  1. In local laptop, use Jupyter notebook and train a XGBoost classification model on the popular Iris flower data set.

2. Test the model and save the model file locally using joblib.

For the above two steps, please refer to iris-model-creation.ipynb notebook that can be found in my github repository.

Download iris-model-creation.ipynb from github to your laptop and run it to create the model and test_point.csv files.

In the iris-model-creation notebook, basically we download the iris flower data set, run a simple XGBoost model on it, test it and save the model as a local file using joblib dump. We save some sample flower data in test_point.csv for testing purpose.

Deploying the Model in SageMaker

For the next two steps, please refer to iris-model-deployment-2024.ipynb notebook available in my github repository.

We have to upload and run this notebook in SageMaker, not locally.

3. In AWS console, create a SageMaker notebook instance and open a Jupyter notebook.

Upload the locally trained model , the test_point.csv and the iris-model-deployment-2024.ipynb files to the sageMaker notebook.

4. Run the iris-model-deployment-2024 notebook in SageMaker.

Important- Run all the cells in the notebook except for the last one- ‘Delete the Endpoint’.

Select and set conda_python3 as kernel, when you see “Kernel not found” pop-up.

The notebook code does the following.

  • Load the model file, open it and test and then upload it to a S3 bucket ( from where SageMaker will take the model artifacts).
  • Create a SageMaker model object from the model stored in S3. We will use SageMaker built-in XGBoost container for this purpose, as the model was locally trained with XGBoost algorithm. Depending on the algorithm you use for modelling, you have to properly pick the corresponding built-in container and deal with the nuances associated with that..SageMaker developer guide should help in that.
  • Create an Endpoint Configuration. Endpoint is the interface through which the outer world can use a deployed model for predictions. More details about Endpoints can be found in SageMaker documentation.
  • Create an Endpoint for the model.
  • Invoke the endpoint from within the deployment notebook to confirm the endpoint and the model are working fine.

After running the notebook till this point, you can see the endpoint created under
Sagemaker -> Inference -> Endpoints in AWS console.

You have to note down the endpoint name displayed. This will be used while creating the Lambda function ( described in the following section ).

Launching necessary AWS Services for End-to-End Communication

After completing the above steps, we will have the model deployed and a SageMaker endpoint ready to be invoked from outside world to get real time predictions from the deployed model.

The following diagram shows how the deployed model can be called using a server-less AWS architecture. A client script calls an Amazon API Gateway API action and passes parameter values. API Gateway is a layer that provides the API to the client. API Gateway passes the parameter values to the Lambda function. The Lambda function parses the value and invokes the SageMaker model endpoint, passing the parameters to the same. The model performs the prediction task and returns the prediction results to Lambda. The Lambda function parses the returned value and sends it back to API Gateway. API Gateway responds to the client with that value.

We will use Amazon’s Rest API gateway for our purpose. Instead of a web browser as the client, we will use Postman app to keep things simple ( If you want to use a browser web interface, you need Flask to be packed in a container which needs to be placed and run inside SageMaker ). In our example, Postman will be used to send Restful POST method to call the API gateway and get the response (Predictions) back.

So we need to set up API gateway and Lambda. Let us go through the remaining few steps.

[ Update as on Jul-2024: Some UI things in AWS has changed from what they were looking like in 2021 when I first wrote this blog. The following steps are still valid and work, but search for the right options and UI screens when you set up the below IAM role, policy, lambda etc. in AWS console ]

5. Create a IAM role that includes the following policy, which gives your Lambda function permission to invoke a model endpoint.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "sagemaker:InvokeEndpoint",
"Resource": "*"
}
]
}

Select Lambda as the use case in AWS service, while creating the role and attach the policy to the role.

6. Create a Lambda function with the below mentioned python code, that calls the SageMaker runtime invoke_endpoint and returns the prediction.

import os
import boto3
import json
# grab environment variables
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime= boto3.client('runtime.sagemaker')
def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))

data = json.loads(json.dumps(event))
payload = data['data']
#print(payload)

response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=payload)
#print(response) result = json.loads(response['Body'].read().decode())

classes = ['Setosa', 'Versicolor', 'Virginica']

# This was working earlier.
#res_list = [ float(i) for i in result]
#return classes[res_list.index(max(res_list))]
# This is working as on July-2024.
return classes[int(result)]

Select “Author from Scratch” and give a function name and select Runtime as Python 3.8 as shown below.

Select “Use an existing role” and pick the role you created in the previous step.

Under the code section of the lambda, enter the python code given at the beginning of this step. Remember to click “Deploy” after entering the code.

Go to the Configuration tab of the Lambda function and add an environment variable “ENDPOINT_NAME” and set it’s value as the same endpoint that was created in the preceding steps. Note that this environment variable is used in the Lambda function’s code.

We have completed setting up the Lambda function.

7. Create a REST API and integrate with the Lambda function

Select API Gateway service on AWS console, and select REST API.

Click on Build and select “New API” . In the next window you get, select “Create Resource” from Actions drop-down menu, and enter a Resource Name.

Note down the Resource Name you choose. It will be a part of the URL created by this service and will be used later when we test the deployment from Postman. Here we have chosen resource name as “irisprediction”. After creating the resource, select “Create Method” from Actions drop-down menu.

Select POST method and “Lambda Function” as Integration type. Enter the name of the Lambda Function you created in the previous steps. Then, select “Deploy API” from Actions drop-down menu. Select Deployment stage “New Stage” and give some stage name. I chose to enter “test”.

Then, finally when you click “Deploy”, you will be given a “Invoke URL” as shown below.

Please note down the URL displayed on the window as “Invoke URL”. It will be used in Postman to contact the API gateway, as described below.

Now we are done with the deployment and setup of the end-to-end communication path.

Testing the final Deployment from local client

8. Finally, use Postman App in your laptop, to POST the Iris flower test data to API gateway and get the prediction result back from AWS cloud.

Example URL : ( Remember to replace with the API URL you got when you created the API in the preceding steps, and append the resource name at the end.)

For example, if the Invoke URL you got was
https://kmnia554df.execute-api.us-east-1.amazonaws.com/test/

append the resource name to the above URL and use in the postman. For example in our case it was “irisprediction” . You can see the screen snapshot given below for the full example URL.

Use method : POST

In the Body, raw input can be given like :

{“data”:”5.099999999999999645e+00,3.299999999999999822e+00,1.699999999999999956e+00,5.000000000000000000e-01"}

You can refer to the test_point.csv for sample data. The four numerical parameters given as data are nothing but sample sepal length, sepal width, petal length and petal width of some Iris flower data point.

When we send the data, we successfully invoke the deployed model endpoint and receive back the flower prediction as “Setosa” in the example above.

So, we have successfully deployed a locally trained model on the AWS cloud using SageMaker and seen it working for real-time inference !

Now the last, but most important step is to clean up the AWS resources that we have created and launched, because if you leave any resource running or occupying space you will be charged for all the time the resource lingers there…

Important- Cleanup, Else you pay the charges !

  1. For SageMaker, remember to delete the notebook instance, endpoint, endpoint configuration and model. You can follow this link for the same.
  2. Go to Lambda service and delete the function you created.
  3. Go to Cloudwatch service, select “Log Groups” and delete the log groups you find there.
  4. Go to IAM service and delete the role and policy you created for this exercise.
  5. Delete the API gateway we launched.

Here ends this model deployment exercise on AWS SageMaker.

Hope you enjoyed and appreciate it…Thanks for following this far !!
If you loved what you have just read, would you be able to buy me a cup of coffee? . It’s okay if you can’t right now, you can please clap as it will encourage me to write more on similar learnings !

Few references for further readings :

Amazon SageMaker Documentation

SageMaker Example Notebooks

Amazon SageMaker Technical Deep Dive Series

--

--