A Practical Guide to Object Detection using MMDetection with Docker

11 min readMar 19, 2024

Background Image by Gerd Altmann from Pixabay

1. Introduction

Object detection is a fundamental computer vision task that extends beyond simple image classification. It involves identifying objects within an image and precisely localizing them along with their corresponding categories. This capability underpins a myriad of applications, from autonomous driving to facial recognition systems. Deep neural networks (DNNs), particularly Convolutional Neural Networks (CNNs), have significantly advanced object detection by offering unparalleled accuracy and efficiency. CNNs, a type of DNN tailored for image analysis, excel at feature extraction from images through convolutional layers, enabling them to discern intricate patterns crucial for object detection (see “Object Detection with Deep Neural Networks” for more details).

When it comes to training object detection models, several powerful deep learning frameworks are available. To name a few, PyTorch, TensorFlow, Keras, MXNet, MMDetection, and PaddlePaddle are widely used by researchers and practitioners alike. Each framework offers unique advantages and capabilities, allowing users to develop and deploy object detection models efficiently and effectively.

In the following sections, we will delve into performing object detection using MMDetection, a popular open-source object detection toolbox built on top of PyTorch. We will demonstrate how to set up a development environment using Docker, configure a model, handle data loading, set up evaluation, and optimize the training process using MMDetection.

2. MMDetection

MMDetection stands as a cutting-edge open-source object detection toolbox, rooted in the PyTorch framework and nurtured within the OpenMMLab ecosystem. It offers a comprehensive suite of tools, meticulously crafted to tackle the complexities of object detection tasks with unparalleled efficiency and accuracy. Drawing from a rich pool of pre-trained models and leveraging state-of-the-art methodologies, MMDetection empowers researchers, developers, and practitioners to push the boundaries of object detection in diverse domains.

With its robust architecture and modular design, MMDetection provides a flexible and extensible platform for experimenting with various detection architectures, training strategies, and optimization techniques. From two-stage detectors to single-stage models, users have access to a plethora of cutting-edge algorithms, ensuring versatility in addressing a wide range of object detection challenges. Additionally, MMDetection’s intuitive interface and extensive documentation streamline the process of model deployment and experimentation, making it accessible to both beginners and experts alike.

The key features of MMDetection include:

Versatile Architecture: MMDetection offers a versatile architecture that supports a wide range of object detection algorithms, including two-stage detectors, single-stage detectors, and anchor-free detectors.
Pre-trained Models: MMDetection provides access to a variety of pre-trained models, allowing users to leverage state-of-the-art architectures for their specific tasks. These models are trained on large-scale datasets and serve as strong starting points for fine-tuning or transfer learning.
Extensive Dataset Support: MMDetection supports various standard object detection datasets, such as COCO, Pascal VOC, and Cityscapes, making it suitable for a wide range of applications and research scenarios.
Modular Design: MMDetection is built with a modular design, allowing users to easily customize and extend different components of the detection pipeline, such as backbones, necks, and heads.
Flexible Training Options: MMDetection provides flexible training options, including support for multi-GPU training, distributed training, and mixed precision training, enabling users to efficiently train large-scale models on diverse hardware setups.
Comprehensive Evaluation Tools: MMDetection offers comprehensive evaluation tools for assessing the performance of trained models on validation and test datasets. Users can easily evaluate metrics such as mAP (mean Average Precision) and IoU (Intersection over Union) to measure detection accuracy.
Active Community Support: MMDetection benefits from an active community of developers, researchers, and users who contribute to its ongoing development, provide support, and share insights and best practices. This vibrant community fosters collaboration and knowledge exchange within the object detection community.

To utilize MMDetection for object detection tasks, follow these steps:

Set up an environment and install the necessary packages to ensure compatibility with MMDetection.
Download the dataset containing the images for object detection. Ensure that the dataset is properly formatted and organized according to the requirements of MMDetection.
Import the predefined configuration provided by MMDetection. These configurations serve as templates for setting up the object detection model architecture, training parameters, and other essential components.
Customize the imported configuration to specify the classes of objects to detect and adjust other parameters as needed to tailor the model to your specific requirements.
Configure the model architecture, data loaders, validation evaluator, and optimization wrapper according to your customized configuration. This step ensures that the model is properly configured for training and evaluation.
Train or fine-tune the model on the dataset using the configured settings. During training, the model learns to detect objects of interest in the provided images.
Evaluate the trained model’s performance on a separate validation set to assess its accuracy and generalization capabilities. This evaluation step helps ensure that the model provides reliable object detection results on unseen data.

3. Hands-on Implementation

Source code:

Original codes can be downloaded from GitHub.

Trained Model:

The trained model uploaded on Hugging Face. It is available to test and/or download.

3.1. Setting Up the Docker Environment:

Prerequisites:

Docker installed on your system (see here).
Basic understanding of deep learning, computer vision concepts, and Docker.
Familiarity with Python programming.

There are two main approaches to using MMDetection in Docker:

Option 1: Use a pre-built MMDetection Docker image.
Option 2: Build your own Docker image with MMDetection and your project code.

3.1.1. Option 1: Using a Pre-built Image

Pull the official MMDetection Docker image:

docker pull homai/openmmlab:pytorch2.1.1-cuda12.1-cudnn8-mmdetection3.3.0

Change the image tag:

docker image tag homai/openmmlab:pytorch2.1.1-cuda12.1-cudnn8-mmdetection3.3.0 mmdetection

This creates a container named mmdetection with a bash shell. You can now proceed with training your model inside the container.

3.1.2. Option 2: Building Your Own Image

Create a Dockerfile specifying the environment setup:

ARG PYTORCH="2.1.1"
ARG CUDA="12.1"
ARG CUDNN="8"

FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel

ARG DEBIAN_FRONTEND=noninteractive
ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0 7.5 8.0 8.6+PTX" \
    TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
    CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
    FORCE_CUDA="1"

# Install the required packages
RUN apt-get update \
    && apt-get install -y ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 curl \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Install MMEngine, MMCV and MMDetection
RUN pip install openmim && \
    mim install mmengine mmcv mmdet

# Install JupyterLab if you are interested to run experiments on the docker and seaborn to plot logs
RUN mim install jupyterlab ipykernel seaborn
RUN git clone https://github.com/open-mmlab/mmdetection.git

Build the Docker image:

DOCKER_BUILDKIT=1 docker build -f docker/mmdetection.Dockerfile -t mmdetection .

3.2. Dataset Preparation

3.2.1. Downloading Dataset:

The Kvasir-Instrument dataset (size 170 MB) contains 590 endoscopic tool images and their ground truth bounding box and mask. It can be downloaded from simula. Download and extract them.

3.2.2. Convert to COCO format:

As the dataset is not in proper format to digest by MMDetection, it must be reformatted to COCO format (see Data Preparation Notebook inside the project GitHub repo).

3.3. Configuring Model

MMDetection provides a range of pre-defined configurations for various object detection models. For this tutorial, we’ll use the RTMDet model. You can find its configuration file here. We’ll import it as our base configuration and make modifications to this configuration to suit our specific task and dataset.

Here are main parts of modification, but you can see the complete config file on the project GitHub repo here.

3.3.1. Import Base Configuration:

Import a base configuration file that serves as the foundation for your training setup. This base configuration typically includes default settings for model architecture, optimizer, and other parameters.

_base_ = [
    'mmdet::rtmdet/rtmdet_s_8xb32-300e_coco.py'
]

3.3.2. Modify Path to Dataset:

Adjust the paths within the configuration to point to your dataset directory. This ensures that the training pipeline accesses the correct data during the training process.

# dataset settings
dataset_type = 'CocoDataset'
data_root = "/data/" 
train_annot = "train_coco.json"
val_annot = "test_coco.json"
test_annot = "test_coco.json"
train_image_folder = "images/"
val_image_folder = "images/"
test_image_folder = "images/"

3.3.3. Define Training Hyperparameters:

Specify the hyperparameters for training, such as batch size, number of epochs, learning rate, weight decay, etc. These parameters significantly impact the training process and model performance.

# Training Parameter Settings
base_lr = 0.004
max_epochs = 100
warmup_iters = 200
check_point_interval = 10
val_interval =  1
stage2_num_epochs = 40

3.3.4. Update Model for Dataset Compatibility:

Modify the model configuration to accommodate the specific requirements of your dataset. This might involve adjusting the input/output dimensions, changing the number of output classes, or fine-tuning certain layers to better suit your data (here we only need to update the number of objects).

model = dict(
    bbox_head=dict(
        num_classes = num_classes
    )
)

3.3.5. Update Data Loaders:

Customize the data loaders to preprocess and load your dataset efficiently. This step involves setting up data augmentation techniques, data normalization, and any other preprocessing steps necessary for training.

train_dataloader = dict(
    batch_size=8,
    num_workers=3,
    dataset=dict(
        type=dataset_type,
        metainfo=metainfo,
        data_root=data_root,
        ann_file=train_annot,
        data_prefix=dict(img=train_image_folder)
        )
    )

3.3.6. Update Optimizer and Learning Rate Scheduler:

Configure the optimizer (e.g., SGD, Adam) and learning rate scheduler (e.g., step-based, cosine annealing) based on your training objectives and model architecture. Tuning these components can significantly impact convergence speed and final model performance.

# optimizer
optim_wrapper = dict(
    _delete_=True,
    type='OptimWrapper',
    optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05),
    paramwise_cfg=dict(
        norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True))

# learning rate
param_scheduler = [
    dict(
        type='LinearLR',
        start_factor=1.0e-5,
        by_epoch=False,
        begin=0,
        end=warmup_iters),
    dict(
        # use cosine lr from 150 to 300 epoch
        type='CosineAnnealingLR',
        eta_min=base_lr * 0.05,
        begin=max_epochs // 2,
        end=max_epochs,
        T_max=max_epochs // 2,
        by_epoch=True,
        convert_to_iter_based=True),
]

3.4. Configure Main Python Script

Adjust the main Python script that orchestrates the training process to incorporate the updated configurations. This script typically initializes the model and before starting the training loop.

from mmengine.config import Config
from mmengine.runner import Runner
import argparse

def main(args):
    config = Config.fromfile(args.config_path)
    config.launcher = "pytorch"
    runner = Runner.from_cfg(config)
    runner.train()
    
if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Get Config Path.')
    parser.add_argument('config_path', type=str, help='path to the config file')
    args = parser.parse_args()
    main(args)

3.5. Start Training on Docker from BASH Terminal

Execute the training process within a Docker container from the BASH terminal. This involves running the main training script with the updated configurations inside the Docker environment, ensuring reproducibility and isolation of the training process.

To train on Docker from the BASH terminal, it’s essential to define local paths to the configuration folder, data folder, and output folder. Assuming the BASH command is executed from the project’s root folder and the dataset resides at ‘/mnt/SSD2/kvasir-instrument/’ on the local PC, the following variables are defined within the BASH script to facilitate this setup.

#!/bin/bash

DATA_DIR="/mnt/SSD2/kvasir-instrument/"
OUT_DIR="$PWD/out"
CONFIG_DIR="$PWD/codes/"

As I’m about to initiate training inside Docker, leveraging GPU acceleration, and given that my PC is equipped with 3 GPUs, I’ll commence the training process with adding the following command to add to the bash script:

GPUS=3

docker run -it --rm \
    --gpus all \
    --mount type=bind,source=$CONFIG_DIR,target=/configs \
    --mount type=bind,source=$DATA_DIR,target=/data \
    --mount type=bind,source=$OUT_DIR,target=/out \
    --shm-size 8g \
    homai/openmmlab:pytorch2.1.1-cuda12.1-cudnn8-mmdetection3.3.0 \
    torchrun --nnodes 1 --nproc_per_node=$GPUS  /configs/main_train_mmengine.py /configs/rtmdet_s_8xb32-300e_coco.py

3.5.1 Additional Considerations:

Mounting Data Volumes: Use the — mount flag with docker run to mount your local dataset directory and any other necessary volumes into the container.
Mounting Config Volumes: Use the — mount flag with docker run to mount your local configuration directory and any other necessary volumes into the container.
Saving Training Outputs: Mount a volume to persist the trained model weights and logs outside the container for easy access.
GPU Support: If you have a GPU available, utilize the nvidia-docker runtime to enable GPU acceleration within the container. Refer to the official Nvidia Docker documentation for details.

3.5.2. Tips

— gpus all: allow the docker to utilize all GPUs installed on PC
— mount type=bind, source=FULL/PATH/TO/LOCAL/FOLDER =/MONTED/PATH/ON/CONTAINER: allow the container to access the folder and associated files (for example, “ — mount type=bind, source=/mnt/SSD2/kvasir-instrument/ , target=/data”, mount the local path “/mnt/SSD2/kvasir-instrument/” to “/data” folder on the running container, and all content of this local folder is accessible on the container at “/data” folder.)

3.6. Log Analysis

You can analyze the training performance using the ‘analyze_logs.py’ script located in the ‘tools/analysis_tools’ directory. This script generates plots for loss, mAP, learning rate, etc., based on a provided training log file. Remember to specify the path to the training log file as an argument when using the script.

Ensure that the paths are accurate according to your container configuration and log folder path.

For example, if the log folder on your PC is within the ‘20240309_054830’ directory and you’re executing commands within the mmdetection container, you can utilize the following commands:

python /workspace/mmdetection/tools/analysis_tools/analyze_logs.py plot_curve /out/20240309_054830/vis_data/scalars.json --keys bbox_mAP bbox_mAP_50 bbox_mAP_75 --out ./bbox_mAP.jpg --legend mAP mAP_50 mAP_75

Mean Average Precision (mAP)

python /workspace/mmdetection/tools/analysis_tools/analyze_logs.py plot_curve /out/20240309_054830/vis_data/scalars.json --keys lr --out ./lr.jpg --legend lr

Learning Rate (lr)

3.7. Model Deployment

Model deployment is a crucial phase where the trained model transitions from development to practical application. Our aim is to make the object detection model accessible and usable for developers and applications. To achieve this, we streamlined the deployment process by converting the PyTorch model to ONNX format, ensuring interoperability and compatibility. We further optimized the model for deployment by converting it to OpenVINO format, enabling efficient inference performance across diverse hardware platforms.

Next, we uploaded both the original PyTorch model and its OpenVINO version to the Hugging Face Model Hub. This step not only provides easy access to our models but also allows developers to test and fine-tune them on custom datasets. The Hugging Face Model Hub serves as a centralized repository for machine learning models, facilitating seamless integration into applications while supporting testing and fine-tuning processes.

Deploying the model on Hugging Face enables fine-tuning on custom datasets, enhancing its applicability to specific tasks and domains. Leveraging the Hugging Face ecosystem provides access to a vibrant community of developers and researchers, fostering collaboration and continuous improvement of the deployed models.

Considering factors such as scalability, latency, and resource utilization is essential when deploying the model in production environments. Employing optimization techniques like model quantization and pruning can enhance performance and resource efficiency, while monitoring and logging mechanisms ensure model reliability and performance tracking over time.

4. Conclusion

Object detection with deep neural networks represents a powerful field with diverse applications. This tutorial offers a foundational understanding to kickstart your exploration. Given the rapid advancements in this field, staying updated on the latest developments and leveraging the plethora of online resources is essential for continued growth.

In this tutorial, we’ve also covered the basics of performing object detection using MMDetection within a Docker environment. We started by setting up a Docker image with all the necessary dependencies, configuring a model, handling data loading, setting up evaluation, and optimizing the training process. By following these steps and adapting them to your specific task and dataset, you can leverage MMDetection to build powerful object detection systems within a Dockerized environment.

References

Source code: Original codes can be downloaded from GitHub.

Trained Model: The trained model uploaded on Hugging Face. It is available to test and/or download.

Object Detection with Deep Neural Networks

Image Classification with Deep Neural Networks

Continue Reading

A Practical Guide to Multi-Class Image Classification using MMPreTrain

Disclaimer

This tutorial is intended for educational purposes only. It does not constitute professional advice, including medical or legal advice. Any application of the techniques discussed in real-world scenarios should be done cautiously and with consultation from relevant experts.