SAHI: A vision library for large-scale object detection & instance segmentation

Published in

Codable

6 min readJan 30, 2021

Visualization of how SAHI inference works.

Github: https://github.com/obss/sahi

Object detection and instance segmentation are by far the most important fields of applications in Computer Vision. However, detection of small objects and inference on large images are still major issues in practical usage.

Here you can see the inference result of the state of the art instance segmentation model Cascade Mask RCNN:

Standard inference result with MMDetection Cascade Mask RCNN model.

As seen, smaller cars on the upper side are not detected.

Is there a way, for detecting these smaller objects without retraining the model and with no need of larger GPU memory allocation?

Here comes the SAHI (Slicing Aided Hyper Inference) to help developers overcome these real-world problems.

What you will know after reading this article:

Introduction to object detection and instance segmentation
Installation of SAHI
Sliced inference with SAHI
Image and dataset slicing with SAHI
Adding support to a new detection framework with SAHI

Introduction to object detection and instance segmentation

A) Object Detection : Object Detection refers to the method of identifying and correctly labeling all the objects present in the image frame.

This broadly consists of two steps :

1 : Object Localization : Here, a bounding box or enclosing region is determined in the tightest possible manner in order to locate the exact position of the object in the image.

2: Image Classification: The localized object is then fed to a classifier which labels the object.

B) Semantic Segmentation: It refers to the process of linking each pixel in the given image to a particular class label. For example in the following image the pixels are labelled as car, tree, pedestrian etc. These segments are then used to find the interactions / relations between various objects.

C) Instance Segmentation: Here, we associate a class label to each pixel similar to semantic segmentation, except that it treats multiple objects of the same class as individual objects / separate entities.

Installation of SAHI

GIF summarizing necessary installation steps for SAHI.

You can install the latest version via pip:

pip install -U sahi

Then install your desired version of pytorch and torchvision:

pip install torch torchvision

Finally, install your desired detection framework (such as mmdet):

pip install mmdet mmcv-full

That’s it, now you can import and use any SAHI function in Python:

from sahi import get_sliced_prediction

Sliced inference with SAHI

The concept of sliced inference is basically; performing inference over smaller slices of the original image and then merging the sliced predictions on the original image. It can be illustrated below:

By performing sliced inference instead of standard inference, smaller objects can be detected with improved accuracy.

Here, we will show a sliced inference demo over this sample image using SAHI:

Sample image to be used in inference demo.

First import required functions for the tutorial:

from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction
from sahi.utils.cv import read_image_as_pil

AutoDetectionModel is the factory class supporting the popular detection frameworks. It can be used to load and perform sliced/standard inference over any MMDetection model.

get_sliced_prediction is the function for performing sliced inference.

Then, we need to create a DetectionModel instance by defining the required parameters:

detection_model = AutoDetectionModel.from_pretrained(
    model_type='mmdet',
    model_path=mmdet_cascade_mask_rcnn_model_path,
    config_path=mmdet_cascade_mask_rcnn_config_path,
    confidence_threshold=0.4,
    device="cuda:0"
)

model_type can be ‘yolov5’, ‘mmdet’, ‘huggingface’, ‘torchvision’, ‘detectron2’ depending on your weight file.

model_path and config_path are required to successfully load any model.

The predictions with lower scores than confidence_threshold will be ignored in the results.

device parameter specifies the inference device, which can be set as cuda:0 or cpu.

Read image:

image = read_image_as_pil(image_dir)

Finally, we can perform a sliced prediction. In this example we will perform prediction over slices of 256x256 with an overlap ratio of 0.2:

result = get_sliced_prediction(
    image,
    detection_model,
    slice_height = 256,
    slice_width = 256,
    overlap_height_ratio = 0.2,
    overlap_width_ratio = 0.2
)

Visualize predicted bounding boxes and masks over the original image:

result.export_visuals(export_dir="result/")

Image("result/prediction_visual.png")

You can check the full details on mmdetection colab notebook or yolov5 colab notebook.

Image and dataset slicing with SAHI

You can use the slicing operations of the SAHI independently.

For example, you can slice a single image as:

from sahi.slicing import slice_imageslice_image_result, num_total_invalid_segmentation = slice_image(
    image=image_path,
    output_file_name=output_file_name,
    output_dir=output_dir,
    slice_height=256,
    slice_width=256,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
)

Or you can create a sliced coco dataset from any coco formatted dataset as:

from sahi.slicing import slice_cocococo_dict, coco_path = slice_coco(
    coco_annotation_file_path=coco_annotation_file_path,
    image_dir=image_dir,
    slice_height=256,
    slice_width=256,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
)

Command Line Interface with SAHI

✔️ Use your weight path to perform inference for YOLOv5 models:

sahi predict --source image_dir/ --model_type yolov5 --model_path yolov5s.pt --slice_height 512 --slice_width 512

✔️ Use your weight path and config path to perform inference for MMDetection and Detectron2 models:

sahi predict --source image_dir/ --model_type detectron2 --model_path weight.pt --config_path config.yaml --slice_height 512 --slice_width 512

Error Analysis Plots/Metrics

✔️ Create COCO formatted prediction results using COCO formatted dataset:

Gif showing COCO formatted dataset prediction capabilities of SAHI.

sahi predict --source image_dir/ --dataset_json_path dataset.json --model_type yolov5 --model_path weight.pt --no_sliced_prediction

✔️ Create error analysis plots using the created result.json:

Gif showing error analysis capabilities of SAHI.

sahi coco analyse --dataset_json_path dataset.json --result_json_path result.json

🎯 Meaning of the metrics:

C75: Results at 0.75 IOU threshod
C50: Results at 0.75 IOU threshold
Loc: Results after ignoring localization errors
Sim: Results after ignoring supercategory false positives
Oth: Results after ignoring all category confusions
BG: Results after ignoring all false positives
FN: Results after ignoring all false negatives

📈 Possible model improvements:

C75-C50 and C50-Loc=Potential gain with more accurate bounding box prediction
Loc-Sim=Potential gain after fixing supercategory confusions
Loc-Oth=Potential gain after fixing category confusions
Oth-BG=Potential gain after fixing all false positives
BG-FN=Potential gain after fixing all false negatives

Interactive Visualization

✔️ Install fiftyone:

pip install -U fiftyone

✔️ Start a fiftyone web app with your predictions results:

Gif showing interactive visualization capabilities of SAHI.

sahi coco fiftyone --dataset_json_path dataset.json --image_dir image_dir/ result.json

Adding support to a new detection framework with SAHI

SAHI library currently supports YOLOv5, all MMDetection models, HuggingFace object detectors, and all Detectron2 models. Moreover, it is easy to add new frameworks.

All you need to do is, create a new .py file under sahi/models/ folder and create a class in that .py file inheriting from DetectionModel class. You can take the YOLOv5 wrapper as a reference.