Deploy your Instance Segmentation Model using AWS Sagemaker (Part 1)

George Bakas
Innovation-res
Published in
6 min readApr 4, 2022

This post will guide you on how to take a ML model that you have created on your local machine and use it within AWS for inference. This can be any model of your liking! Whether it is a simple classification model, to an instance segmentation model that uses Detectron2 as the backbone (as in our case), AWS Sagemaker is THE solution.

In most ML implementations, Docker is used to containerize the application. The container has all the information and data needed for a specific application to run, regardless the OS that the container is operating.

For this tutorial, you will need an AWS account. If you don’t already have one, you can create one here.

Conept & Setup

In our case we have trained a model that uses as a backbone the Detectron2 implementation.

Detectron2 is Facebook AI Research’s next generation library that provides state-of-the-art detection and segmentation algorithms. It is the successor of Detectron and maskrcnn-benchmark. It supports a number of computer vision research projects and production applications in Facebook.

We have performed a training that classifies and segments particles within metal powders used in Metal Additive Manufacturing using images from Electron Microscopes (will not get into details regarding this BUT feel free to check this amazing post on how to optimize training and set hyperparameters by Spyros Dimitriadis here)

Having this model for testing purposes is great… The next step though was how to take this project into another level. At first we developed a flask application that serves as an front-end to communicate with the model. We containerized this application using Docker and tried to serve it. Result?

NOT GREAT

Several problems such as request handling, CPU dependencies occured. So we decided to create a workflow in order to serve our application successfully.

In this tutorial we will only get into details regarding the API which serves the model. Now that you have the whole picture, lets get into the technical details!

Our way of thinking behind this, is to build a container using AWS Sagemaker backbone template for endpoint serving. The output will be the development of a container that is uploaded on AWS ECR and is used to serve our developed model.

The code can be found here

Let’s start with our repo. The folder’s structure should look like this:

<my-custom-model-name>/
├── nginx.conf
├── predictor.py
├── serve
├── weights
│ └── (.pkl, .pth, etc)
└── wsgi.py
Dockerfile
requirements.txt

In our case the folder structure looks a bit changed while we include some postprocessing scripts for analysis:

maskrcnn/
├── nginx.conf
├── predictor.py
├── serve
├── static
│ └── latest
│ ├── config.yaml
│ └── model_final.pth
├── tools
│ ├── __init__.py
│ ├── feret_diameter.py
│ ├── metrics.py
│ ├── rotating_calipers.py
│ ├── statistical_analysis.py
│ └── utilities.py
├── utils
│ └── batchpredictor.py
└── wsgi.py
Dockerfile
requirements.txt
  • nginx.conf where configuration file for the nginx front-end is stored
  • wsgi.py where we have a wrapper to use the Flask app
  • predictor.py where the inference of the model happens
  • serve executable python script that starts nginx and gunicorn with correct configs and waits for gunicorn to exit (can also be found here)

The maskrcnn folder contains all info relevant to the model.

The static folder contains the weights. The tools folder contains some code that is related to post-processing assessment of the images’ instances.

The utils folder includes a file called batchpredictor.py which hanldles a lot of pictures at the same time during inference rather than image-by-image. For further details check out the code on the github repo!

The most important file within the maskrcnn repo, is the predictor.py. It includes all required information regarding serving. It is a file that implements a flask server to do inferences. It’s the file that you will modify to implement to run inference your own ML algorithm.

The backbone of the code consists of two application routes:

app = Flask(__name__)@app.route("/ping", methods=["GET"])
def ping():

"""
Determines if the container is working and is healthy.
Healthy: If the model is loaded successfully
"""

...
# func probably needs to be: inference (also called "scoring" "prediction", or "transformation").
@app.route("/invocations", methods=["GET", "POST"])
def transformation():

"""
Inference on batch of images.
Return images and their segments
"""
...

We define as health check (ping route) any condition that returns a True Response. For us, model loading is the case.

When a user pings or invokes the AWS Endpoint, the route transformation is used. The input is a json file that includes all the required information. Don’t worry just yet! We will get into details on how to define this request json file in the next post (check Part 2 here)!

Within the predictor.py there exists a class that we use for our model to load and predict. This section shows how to load a model and make predictions using the detectron2 implementation.

class ScoringService(object):

# init metadata
metadata = MetadataCatalog.get("sem")
metadata.set(thing_classes=["cls1", "cls2", "cls3", "cls4"])
@classmethod
def get_predictor(cls):
"""
Get the model object for this instance and load it
"""
config = get_cfg()
config.merge_from_file(WEIGHTS_PATH)
config.MODEL.WEIGHTS = os.path.join(log_file, 'model_final.pth')
# Test
config.MODEL.DEVICE = "cpu"
predictor = BatchPredictor(config)
return predictor
@classmethod
def predict(cls, images):
"""
For the input(s), do predictions and return them.
Args:
input: images
"""
# let the prediction begin !
predictor = cls.get_predictor()
outputs = predictor(images)
out = []
for i, img in enumerate(images):
v = Visualizer(img[:, :, ::-1],
metadata=cls.metadata,
scale=0.5,
instance_mode=ColorMode.SEGMENTATION)
out.append(v.draw_instance_predictions(outputs[i ["instances"].to("cpu")))
return out, outputs

We are now ready!

predictor.py

Our requirements.txt includes the following

psycopg2==2.8.6
numpy
Keras~=2.2.4
scipy
scikit-image
matplotlib
boto3~=1.17.94
Flask~=1.1.2
Pillow~=8.2.0
setuptools~=56.0.0
opencv-python~=4.5.3.56
imgaug~=0.4.0
plotly==5.3.0
pandas
h5py==2.10.0
gunicorn
IPython
scikit-image
opencv-python
markupsafe==2.0.1
s3fs

We use a baseline configuration for the wsgi.py and nginx.conf as provided by amazon here.

We have used

wsgi.py

and

nginx.conf

That is is! The code for running inference on AWS Sagemaker is complete. Let us move to the next step. This is the creation of the Dockerfile. The Dockerfile we created looks like this…

Dockerfile

Using the Python image, we install and update what is needed for Detectron2 to operate. We will not get into further details going line by line within the Dockefile. The only thing you need to know is that AWS endpoint expects an image which is operating in the /opt/program/ directory and thus we include this as an ENTRYPOINT.

You are basically at the final steps! The hard part is over!

What we need to do now is build the image, connect to the AWS via our command line and upload the image we created on Elastic Container Repository (ECR).

Login on AWS and create a new repository:

$ aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 145639673445.dkr.ecr.eu-central-1.amazonaws.com$ aws ecr create-repository --repository-name <my-repo-name> --image-scanning-configuration scanOnPush=true --region <my-aws-region>

By running the second command you will get the repositoryUri from the resulting output. Copy it.

Note: I advise you to use an EC2 instance on AWS while building and uploading the container on your local machine may require a lot of time. Using an AWS EC2 will be much faster both for building and uploading on ECR.

Build the image locally using docker (make sure you Docker is running more info here and here):

$ docker build -t <my-local-image-name> .

Type the following to tag the image as required so that it is pushed on your AWS ECR repo:

$ docker tag <my-local-image-name>:latest <uri-copied-from-ecr-create-repo>$ docker push <uri-copied-from-ecr-create-repo>

This is it! You have successfully uploaded your image on ECR. Now all you need to do is set up the AWS Sagemaker and make use of your uploaded image! Feel free to check out Part 2 on how to invoke your uploaded image on AWS Sagemaker!!!

References:

[1] Feel free to explore this repo to find more examples that fit your case: https://github.com/aws/amazon-sagemaker-examples

--

--

George Bakas
Innovation-res

I am a proficient DevOps Engineer with a PhD in High Energy Physics, interested in creating CI/CD pipelines, infrastructure as code and orchestration.