Deploying An Image Segmentation Model with Detectron2 and BentoML

Published in

BentoML

5 min readOct 11, 2023

Note: The content in this blog post may be outdated. To learn more about BentoML, see its latest documentation.

In one of my previous blog posts that introduces BentoML, you already know how BentoML works as a flexible platform that simplifies the task of serving and deploying ML models. In this blog post, I will demonstrate how to use BentoML and Detectron2 for a common ML use case, or image segmentation.

In image segmentation, an image is divided into multiple segments, where each segment corresponds to different objects or parts of objects in the image. When you send an image to an image segmentation model, it segments the image by identifying and delineating different objects within it. The model then returns an image where each identified object is highlighted and separated from the others.

Detectron2, developed by Facebook AI Research (FAIR), stands out as a state-of-the-art library for object detection and segmentation. It’s built on PyTorch and offers a rich set of features that cater to both research and production needs.

By the end of this article, you’ll have a clear understanding of how to train an image segmentation model using Detectron2, package it with BentoML, and deploy it as a service.

Installing dependencies

To set up the environment for this project, run the following commands to get everything in place:

pip install bentoml --upgrade
pip install torch --upgrade
pip install torchvision --upgrade
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

I had a hiccup when trying to install detectron2 with the following error.

ImportError: /home/sherlock/.local/lib/python3.9/site-packages/detectron2/_C.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7reshapeEN3c108ArrayRefIlEE

To solve this problem, I uninstalled detectron2, upgraded both torch and torchvision, and then reinstalled detectron2. It seems that detectron2 requires specific versions of torch and torchvision to function correctly. For compatibility and installation details, refer to the official instructions.

Downloading the model

This project uses a model trained for instance segmentation on the COCO dataset. To assist with this, Detectron2 provides a handy model zoo containing a range of pre-trained models. Based on this Detectron2 example, you can create a download_model.py file to train and download the model. Here is my file for your reference:

import bentoml

# Import Detectron2-related packages
import detectron2.config as Config
import detectron2.model_zoo as ModelUtilities
import detectron2.modeling as Modeling
from detectron2.engine import DefaultPredictor

MODEL_CONFIG_PATH = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
MODEL_NAME = "coco-masked-rcnn"

# Setting up the configuration
print("Setting up model configuration...")
cfg = Config.get_cfg()
cfg.merge_from_file(ModelUtilities.get_config_file(MODEL_CONFIG_PATH))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # Set the testing threshold
cfg.MODEL.WEIGHTS = ModelUtilities.get_checkpoint_url(MODEL_CONFIG_PATH)
cfg.MODEL.DEVICE = "cpu"  # Set device to CPU

# Building the model
print("Building the model...")
model = Modeling.build_model(cfg)
model.eval()  # Set the model to evaluation mode

# Saving the model with BentoML
print("Saving the model...")
bentoml.detectron.save_model(MODEL_NAME, model, config=cfg)

# Creating a predictor and saving it
print("Creating and saving the predictor...")
predictor = DefaultPredictor(cfg)
bentoml.detectron.save_model(f"{MODEL_NAME}-predictor", predictor)

print("Completed!")

This file mainly does the following three things:

Configuring: Configure the model specifics, such as the model type and its threshold for detection.
Building: Build the Detectron2 model using the provided configuration.
Saving: Save both the model and a predictor (which is useful for quick inferences) with BentoML.

To download and save the model, simply run the script:

python3 download_model.py

Once done, you can view the downloaded models in the BentoML local Model Store using the following command:

$ bentoml models list

Tag                                          Module             Size        Creation Time
coco-masked-rcnn-predictor:sdbh6xte4wp2qkiy  bentoml.detectron  169.67 MiB  2023-10-07 07:46:04
coco-masked-rcnn:sdbh6xde4wp2qkiy            bentoml.detectron  169.67 MiB  2023-10-07 07:46:03

Creating a BentoML Service

Having the model ready, the next step is to create a BentoML Service. As I explained before, the Service encapsulates the logic for serving the model and defines how data is processed during input and output. In addition, it allows you to turn the model into a scalable object, also known as a Runner, making it efficient for large-scale inferences. Let’s take a closer look at the service.py file I created for this purpose:

import bentoml
import PIL.Image
import numpy as np

# Import Detectron2-related packages
from detectron2.data import MetadataCatalog
from detectron2.utils.visualizer import ColorMode, Visualizer
from detectron2.engine import DefaultPredictor

MODEL_NAME = "coco-masked-rcnn-predictor"

# Load the model and configurations
bentomodel = bentoml.detectron.get(MODEL_NAME)
cfg = bentomodel.custom_objects['config']
predictor = DefaultPredictor(cfg)

# Create a BentoML Service and wrap a Runner
svc = bentoml.Service(name="masked-rcnn", runners=[bentomodel.to_runner()])

@svc.api(input=bentoml.io.Image(), output=bentoml.io.Image())
async def predict(im: PIL.Image.Image) -> PIL.Image.Image:
    # Predict function to process input image and return the model's predictions
    
    # Convert input image to numpy array
    tensor = np.array(im)
    
    # Run the model prediction
    output = predictor(tensor)
    instances = output['instances']
    
    # Visualize the predictions
    metadata = MetadataCatalog.get(cfg.DATASETS.TEST[0])
    visualizer = Visualizer(
        tensor[:, :, ::-1], metadata, scale=1.0, instance_mode=ColorMode.SEGMENTATION
    )
    visualization = visualizer.draw_instance_predictions(instances)
    
    # Convert the visualized output to an image and return
    return PIL.Image.fromarray(visualization.get_image()[:, :, ::-1])

This file mainly does the following things:

Initializing the model: Fetch the model previously saved with BentoML and initialize a predictor from Detectron2.
Defining a BentoML Service: Set up a BentoML Service and wrap the model into a Runner for efficient serving.
Exposing an API endpoint: The predict function is defined as an endpoint in this Service. It processes the input image, runs the model on it, visualizes the model's predictions, and returns the result.

Use bentoml serve to start the server.

$ bentoml serve service:svc --reload

2023-10-07T07:47:13+0000 [INFO] [cli] Environ for worker 0: set CUDA_VISIBLE_DEVICES to 0
2023-10-07T07:47:13+0000 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "service:svc" can be accessed at <http://localhost:3000/metrics>.
2023-10-07T07:47:14+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:svc" listening on <http://0.0.0.0:3000> (Press CTRL+C to quit)

The server should be accessible at http://0.0.0.0:3000. Through the Swagger UI, you can upload an image to interact with the model. I tested it using this image from the COCO dataset, and here’s what I got:

Building a Bento

After validating the performance of the Service, the next step is to package the project into a deployable artifact. In the BentoML ecosystem, this artifact is referred to as a Bento. The Bento not only encapsulates the model but also provides a streamlined way to containerize it with Docker or deploy it on BentoCloud.

To create a Bento, you need a bentofile.yaml that specifies the Service and its dependencies:

service: 'service:svc'
include:
  - '*.py'
python:
  packages:
    - torch
    - torchvision

Run the following command:

bentoml build

Expected output:

██████╗ ███████╗███╗   ██╗████████╗ ██████╗ ███╗   ███╗██╗
██╔══██╗██╔════╝████╗  ██║╚══██╔══╝██╔═══██╗████╗ ████║██║
██████╔╝█████╗  ██╔██╗ ██║   ██║   ██║   ██║██╔████╔██║██║
██╔══██╗██╔══╝  ██║╚██╗██║   ██║   ██║   ██║██║╚██╔╝██║██║
██████╔╝███████╗██║ ╚████║   ██║   ╚██████╔╝██║ ╚═╝ ██║███████╗
╚═════╝ ╚══════╝╚═╝  ╚═══╝   ╚═╝    ╚═════╝ ╚═╝     ╚═╝╚══════╝

Successfully built Bento(tag="masked-rcnn:skvxidde6gp2qkiy").
Possible next steps:
 * Containerize your Bento with `bentoml containerize`:
    $ bentoml containerize masked-rcnn:skvxidde6gp2qkiy  [or bentoml build --containerize]
 * Push to BentoCloud with `bentoml push`:
    $ bentoml push masked-rcnn:skvxidde6gp2qkiy [or bentoml build --push]

Conclusion

In this guide, I have walked through the process of deploying an image segmentation model using Detectron2 and BentoML. BentoML provides a robust platform that not only facilitates model deployment but also ensures scalability and production-readiness. It is applicable to a wide range of traditional models, large language models, or your custom models. Stay tuned and I will share more use cases of deploying ML models with BentoML and its ecosystem tools! Happy deploying and coding!