Inference your Object Detection Transformer (DETR) custom trained model on AWS Sagemaker using PyTorchModel Library

George Bakas
Innovation-res
Published in
3 min readJul 11, 2022
Image provided by S. Dimitriadis

Hello and welcome to a new tutorial! In this tutorial we find ourselves deploying a custom trained DETR model for Object Detection using the AWS resources directly. We will be utilizing the PyTorch Deep Learning Inference Containers. The object detection model is trained to classify and label particles found in SEM images of Metal powder feedstock used for metal Additive Manufacturing (AM) purposes.

I will not get into details regarding the custom model itsself, while the purpose of this tutorial is to showcase how easy it is to deploy your DETR model using AWS PyTorch Deep Learning Containers.

A great article on how to Inference your own NLP trained-model on AWS SageMaker with PyTorchModel or HuggingeFaceModel is provided in this post by Spyros Dimitriadis.

To train your DETR model on a custom dataset, feel free to follow the instructions found in this github repo here by Niels Rogge. The training is based on the repo provided by Facebook Research group found here.

The post is split into 4 parts:

  1. Save the model
  2. Custom inference.py script
  3. Create a model.tar.gz file and upload it on S3
  4. Deploy the endpoint with AWS Sagemaker Python SDK
  1. Save the model

To save the model you have trained, follow the code found below

import os
import torch
# ... trained `model`, then save it to `model_dir`
with open(os.path.join(args.model_dir, 'model.pt'), 'wb') as f:
torch.save(model.state_dict(), f)

2. Custom inference script

The inference.py script is used to fit user needs when it comes to model inference. We will overwrite the existing functions provided by AWS. The functions that we will overwrite are the following:

  • model_fn: loads the model, returns the model used in the predict_fn
  • input_fn: This function handles data decoding. In our case we have encoded images, and sent them via a json file, thus we decode the input request file
  • predict_fn: After the inference request has been deserialized by input_fn, the SageMaker PyTorch model server invokes predict_fn on the return value of input_fn
  • output_fn: After invoking predict_fn, the model server invokes output_fn for data post-process

The following gist includes all changes that are needed to overwrite the inference.py script. It includes also the definition of the DETR model as well as the DETR inference model.

inference.py

3. Create and upload the model.tar.gz file on S3

In order for the inference to run smoothly, a specific document format is needed within the tar.gz file

model.tar.gz/
├── model.pth
└── code/
├── inference.py
└── requirements.txt

Make sure that the format is correct, or else the inference will not succeed!!

The requirements.txt file includes the following libraries:

numpy
pillow
torchvision
pytorch-lightning
transformers
timm

Let’s create the .tar.gz file:

!tar -cvpzf model.tar.gz model.pt ./code

We assume that you are already familiar with AWS services. We are going to use boto3 (AWS Python SDK) and sagemaker Python library to upload the artifacts directly on S3:

Note: You need to create an S3 bucket before you run the code found below

upload tar.gz model

4. Deploy model and endpoint with the AWS PyTorchModel

From our point of view, it would be better to run all the cells provided above and below into a notebook.

Notes:

  • PyTorchModel: Creates a model within AWS Sagemaker. You need to specify model.tar.gz location on S3, PyTorch version, Python version, an IAM role with permissions to create an Endpoint and finaly which is the inference code to use (entrypoint) and where is it located (code/inferency.py)
  • predictor: this is the the lines of code that create the endpoint. The user specifies instance types (here), a chosen endpoint_name and (de)serializer. The Serializer specifies that the model will be expecting a JSON file and the Deserializer specifies that the output will be sent as a JSON file.

Wait for a couple of minutes for the model to be deployed. Once its finished, we can use it for inference using a few lines of code:

Output is a list of lists containing the bound boxes of the identified object with format [xC, yC, width, height]

That is it!

Don’t forget to delete model and endpoint because you will see extra charges!

predictor.delete_model()
predictor.delete_endpoint()

EXTRA:

If the model has already been deployed and you want to test it, feel free to use the code found below:

Thanks for reading! Stay tuned for more!

--

--

George Bakas
Innovation-res

I am a proficient DevOps Engineer with a PhD in High Energy Physics, interested in creating CI/CD pipelines, infrastructure as code and orchestration.