Deploy your ML Model using AWS Sagemaker (Part 2)

George Bakas
Innovation-res
Published in
3 min readApr 14, 2022

We continue to the next section in which we showcase how to setup the basic outline for you to serve your application on AWS. Assuming that you have uploaded your image on ECR, lets go!

(If not; go back to Part 1 and come back!)

Baseline & Definitions

Depending on your use-case, AWS Sagemaker offers several options:

  1. Real-time endpoints that make one prediction at a time → Real-time inference
  2. Requests with large payload sizes up to 1GB, long processing time and real-time latency requirements (such as pictures or videos) → Asynchronous Inference
  3. Predictions for an entire dataset → Batch Transform

We will not get into more details regarding each of these options. In our case, where we need to make real-time predictions using Scan Electron Microscope images we implement the Asynchronous Inference.

There are several ways one can implement these:

  • Using the AWS online console
  • Using the AWS SDK for python → boto3

In our case we will showcase the setup of the AWS Sagemaker Endpoint using the boto3 Python library.

Endpoint Structure

In order for an endpoint to serve a model, 3 requirements need to be met.

  1. Model: In general, this part represents the model itself. When creating a custom model, this step includes the model name and describes the primary container that is used for serving. To be more precise, the whole Part 1 is about the creation of the model that we are going to use.
  2. Endpoint Configuration: In the configuration, you identify one or more models to deploy as well as the resources that you want Amazon SageMaker to provision. Also, the way of serving is defined in this step (ex. Simple, Async inference)
  3. Endpoint: Creates an endpoint using the endpoint configuration specified. When configured, the endpoint launches the resources (ML compute instances) and deploys the model(s).

Structure

<custom-testing-directory>/
├── create_endpoint.py
├── update_endpoint.py
├── destroy_endpoint.py
├── test_endpoint.py
├── images
│ └── image1.jpg
│ └── image2.jpg
requirements.txt

The directory contains the following:

  • create_endpoint.py: Python script to create model (using the ECR URI of the model container), endpoint configuration and endpoint.
  • update_endpoint.py: Updates the endpoint by changing model and endpoint configuration while endpoint is still running
  • test_endpoint.py: Tests the endpoint. Invokes developed endpoint using images found in the images directory
  • destroy_endpoint.py: Destroys endpoint configuration and model
  • images for testing
  • requirements.txt

You can find the complete code provided by AWS here

First of all build the python environement using the requirements.txt file. You will at least need to make use of boto3 and json libraries.

$ pip install -r requirements.txt 

Define the workflow:

$ python create_endpoint.py
$ python test_endpoint.py

Find the create_endpoint.py below:

create_endpoint.py

The next code provides the testing of the endpoint.

test_endpoint.py

In our implementation, we send images to the sagemaker endpoint. Thus, we encode the input images and insert these into a request.json file. The request is uploaded on S3 and the Async endpoint is given the uri path to the json location.

When using the async inference, the sagemaker client returns a response json file. In our implementation, we send back the Output location of the results (uri path to s3).

Your implementation is now ready!

You have created a pipeline that creates a model, a model configuration and a Sagemaker Endpoint.

Note: Once the model has been created and everything has been accomplished it is a good idea to destroy the model and the model configuration or else you will see augmented costs! You can use the code found below:

destroy_endpoint.py

--

--

George Bakas
Innovation-res

I am a proficient DevOps Engineer with a PhD in High Energy Physics, interested in creating CI/CD pipelines, infrastructure as code and orchestration.