This article will show you how to efficiently use Detectron2 pre-trained models for inferences using modular computer vision pipeline for video and image processing.
Table of contents:
- What is Detectron2?
- Project setup
- Project structure
- Image processing
- Video processing
- Background separation
What is Detectron2?
It is a second generation of the library as the first Detectron was written in Caffe2 and then with the maskrcnn-benchmark reimplemented in PyTorch 1.0. Detectron2 is a ground-up rewrite and extension of the previous effort using PyTorch.
FAIR’s team very well states the motivation behind the project:
“We builtDetectron2 to meet the research needs of Facebook AI and to provide the foundation for object detection in production use cases at Facebook. We are now using Detectron2 to rapidly design and train the next-generation pose detection models that power Smart Camera, the AI camera system in Facebook’s Portal video-calling devices. By relying on Detectron2 as the unified library for object detection across research and production use cases, we are able to rapidly move research ideas into production models that are deployed at scale.”
Detectron2 beyond state-of-the-art object detection algorithms includes numerous models like instance segmentation, panoptic segmentation, pose estimation, DensePose, TridentNet.
You can look at the Detectron2 Model Zoo site to find a broad set of baseline results and trained models to start with.
Regards to FAIR’s team, Facebook AI’s computer vision engineers created Detectron2go, which is an additional layer that will allow easy and optimized model deployment to production (but not yet released).
Start with cloning the following repository:
$ git clone git://github.com/jagin/detectron2-pipeline.git
$ cd detectron2-pipeline
$ git checkout 9460e3806c3ef5208ba8e5b4099fcb75ef6f39d1
9460e3806c3ef5208ba8e5b4099fcb75ef6f39d1 indicates the source code compatible with the content of this story.
Create and activate the environment using Conda:
$ conda env create -f environment.yml
$ conda activate detectron2-pipeline
The created environment includes all the requirements we need to setup Detectron2 and to run our project.
Next, we need to clone and install Detectron2 itself.
# We want to clone Detectron2 outside of our detectron2-pipeline repo
$ cd ..$ git clone https://github.com/facebookresearch/detectron2.git
$ cd detectron2
$ git checkout 3def12bdeaacd35c6f7b3b6c0097b7bc31f31ba4
$ python setup.py build develop
or if you are on macOS:
$ MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build develop
3def12bdeaacd35c6f7b3b6c0097b7bc31f31ba4 indicates the Detectron2 source code compatible with the commit
9460e3806c3ef5208ba8e5b4099fcb75ef6f39d1 of the
detectron2-pipeline repository. You can check out the latest version of both repositories but I can not guarantee that it will work as described here.
In case of any problems, please refer to the Detectron2 installation guide.
│ ├── annotate_image.py
│ ├── annotate_video.py
│ ├── async_predict.py
│ ├── capture_image.py
│ ├── capture_images.py
│ ├── capture_video.py
│ ├── display_video.py
│ ├── __init__.py
│ ├── libs
│ │ ├── async_predictor.py
│ │ ├── file_video_capture.py
│ │ ├── __init__.py
│ │ └── webcam_video_capture.py
│ ├── pipeline.py
│ ├── predict.py
│ ├── save_image.py
│ ├── save_video.py
│ ├── separate_background.py
│ └── utils
│ ├── colors.py
│ ├── detectron.py
│ ├── fs.py
│ ├── __init__.py
│ ├── text.py
│ └── timeme.py
As you can see, it is not a yet-another-hello-word-example project. I created this project to gather some best practices to efficiently process videos and images and to be able to experiment with different models from Detectron2 zoo.
The structure of the project is based on the modular image processing pipeline described in my previous stories:
Modular image processing pipeline using OpenCV and Python generators
In this blog story, you will learn how to implement a simple and modular pipeline for image processing using OpenCV and…
Video processing pipeline with OpenCV
This story will show you how to extend the modular image processing pipeline using OpenCV and Python generators with a…
It was greatly extended with the following elements:
- faster video reading using a separate thread, so the main thread of the application is not blocked by reading and decoding the frames,
- utilizing Python multiprocessing for faster inference, running model asynchronously in separate processes so we can use GPU(s) or CPU(s) in parallel.
Part of the project content is boilerplate code.
“In computer programming, boilerplate code or just boilerplate are sections of code that have to be included in many places with little or no alteration. When using languages that are considered verbose, the programmer must write a lot of code to accomplish only minor functionality. Such code is called boilerplate.” — Wikipedia
As boilerplate code we can consider:
utils/: common utility scripts,
pipeline/lib/file_video_capture.py: video file capturing helper class utilizing threading and the queue to obtain FPS speedup,
pipeline/lib/webcam_video_capture.py: helper class for capturing webcam in a separate thread,
pipeline/capture_image.py: pipeline task to capture single image file,
pipeline/capture_images.py: pipeline task to capture images from a directory,
pipeline/capture_video.py: pipeline task to capture video stream from file or webcam using a faster, threaded method for reading video frames.
pipeline/display_video.py: pipeline task to display images as a video,
pipeline/pipeline.py: common pipeline class fo all pipeline tasks,
pipeline/save_image.py: pipeline task to save images,
pipeline/save_video.py: pipeline task to save a video.
This is the part of the code that we have to include over and over in our computer vision projects.
The custom code is the code interacting with Detectron2 libraries:
pipeline/annotate_image.py: pipeline task for image annotation,
pipeline/annotate_video.py: pipeline task for video annotation,
pipeline/lib/async_predictor.py: asynchronous predictor utilizing multiprocessing to run the inferences in parallel in separate processes,
pipeline/predict.py: pipeline task to perform a prediction,
pipeline/async_predict.py: pipeline task to perform prediction asynchronously using multiprocessing,
pipeline/separate_background.py: custom pipeline task to separate the background from foreground instances as an example use of the semantic segmentation model from Detectron2.
All the project model configurations are stored in
config directory and can be used with
│ ├── COCO-Detection
│ │ ├── faster_rcnn_R_50_FPN_3x.yaml
│ │ └── retinanet_R_50_FPN_3x.yaml
│ ├── COCO-InstanceSegmentation
│ │ └── mask_rcnn_R_50_FPN_3x.yaml
│ ├── COCO-Keypoints
│ │ └── keypoint_rcnn_R_50_FPN_3x.yaml
│ └── COCO-PanopticSegmentation
│ └── panoptic_fpn_R_50_3x.yaml
I encourage you to review the code and if you have any questions, don’t hesitate to ask below in the response section of the story.
There are two main scripts we can run:
process_images.py: to process image(s)
process_video.py: to process video
Let’s start with the
process_images.py and look at the available options:
$ python process_images.py
By default, the script will process images from the
--input directory, perform the instance segmentation and save the results in the
$ python process_images.py -i assets/images/others -p
We can also try another model from the
config directory like a keypoint estimation for this particular image without touching the code and just changing the configuration:
$ python process_images.py -i assets/images/others/couple.jpg -p --config-file configs/COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml
The visualization is realized with
pipeline/annotate_image.py where we use
detectron2.utils.visualizer.Visualizer from Detectron2.
The pipeline will run separate processes for the model execution on GPU (if your machine has one) or CPU working in parallel with the root process. You can increase the number of GPUs or CPUs with options:
--cpus but don’t mix both together.
For implementation details of the multiprocess, asynchronous prediction see
pipeline/async_predict.py where the code was partially ported from
Including more GPUs will always speed up the pipeline execution, but it is not the same case with CPUs. GPU is dedicated for inference where CPU is occupied by a lot of other system processes and threads. From my experiments, more than two CPU workers don’t help, but that could depend on the number of available CPU cores.
Let’s say, you have GPU available and currently have some problems with CUDA installation and configuration, but you want to see some results immediately. Then instead of struggling with the setup, you can force the pipeline to only use CPU providing the
--gpus 0 option in the execution command line.
You can also run the processing with a single process using
You can try the same with the video or webcam but be warned that realtime video processing with inference is computational and time-consuming, depending on GPU availability. Video stream processing could just feel slow and sluggish.
$ python process_video.py -h
They are almost the same as for the
process_image.py script. The instance segmentation model is also used by default.
If your computer is equipped with a webcam you can test it running this command:
$ python process_video.py -i 0 -d -p
-i 0 will indicate that your input is the default camera,
-d will display the result window and
-p will display progress info.
As long as you are not equipped with a decent GPU card or two, realtime camera processing could be lagging a lot. Then I would suggest to save your camera video and process it as below.
To run predictions on a video file, you can trigger:
$ python process_video.py -i assets/videos/walk.small.mp4 -p -d -ov walk.avi
-ov walk.avi command option will save the output result to
-d will display the window with the result but if you want to process your video faster just remove this option. Displaying video takes some CPU time. The progress will be visible anyway thanks to
Let’s try it for panoptic segmentation on some video file of a traffic road:
$ python process_video.py -i assets/videos/traffic.small.mp4 -p -d -ov traffic.avi --config-file configs/COCO-PanopticSegmentation/panoptic_fpn_R_50_3x.yaml
Now being armed with the Detectron2 model’s arsenal, we are limited only by our imagination in creating and testing unusual computer vision solutions.
I’ve prepared a simple example of background separation using Detectron2 instance segmentation model. The background will be separated by blurring it. It gives you the effect similar to taking the picture with a camera equipped with a lens with a large aperture which will create a shallow depth of field such that the subject is in focus, and the background is out of focus. Additionally, to get a more fancy picture, we can desaturate or grayscale the background.
The algorithm is quite simple (see the source code of
pipeline/separate_background.py). First, we perform an instance segmentation using Detectron2 to extract object masks from the scene, and then this information is passed to the
SeparateBackground pipeline task. There we do some simple math operations on an input image and mask using OpenCV methods.
# Multiply the foreground with the mask
foreground = cv2.multiply(foreground, mask)# Multiply the background with ( 1 - mask )
background = cv2.multiply(background, 1.0 - mask)# Add the masked foreground and background
dst_image = cv2.add(foreground, background)
Run the pipeline with an image:
$ python process_images.py -i assets/images/others/couple.jpg -sb
to get the picture like shown earlier, or you can run the same task on video:
$ python process_video.py -i assets/videos/walk.small.mp4 -sb -d