Review: SDS — Simultaneous Detection and Segmentation (Instance Segmentation)

Works on Both Object Detection and Semantic Segmentation, Better Than R-CNN

In this story, SDS (Simultaneous Detection and Segmentation), by University of California and Universidad de los Andes, is briefly reviewed. It talks about Simultaneous Detection and Segmentation. When object detection and semantic segmentation are combined, actually it is the topic of instance segmentation. This is a 2014 ECCV paper with more than 600 citations. (Sik-Ho Tsang @ Medium)


  1. SDS Architecture
  2. Results

1. SDS Architecture

SDS Architecture
  • The network is modified based on R-CNN.
  • First, region proposals are generated by MCG, which is a better region proposal generation approach compared with Selective Search.
  • Then each region proposal will go through two pathways.
More Details on Conv and FC Layers for Two Pathways
  • The upper pathway is the bounding box network, which generate one feature vector.
  • The bottom pathway is the so called “region” (in the paper) or segmentation network where the background pixels are masked out, which generate another feature vector.
  • After that, two feature vectors are concatenated.
  • Based on this concatenated feature vector, classification is performed using SVM.
  • And the “region “ or segmentation result is refined by predicting the probability of belongs the foreground.

2. Results

2.1. Object Detection

  • A, B, and C are how the concatenated feature vectors are trained. They all belongs to SDS. And SDS is better than R-CNN as well as R-CNN using MCG for generating region proposals.

2.2. Semantic Segmentation

Updated Results from
  • SDS has the best results. (ref: refinement)
Some Visualizations
  • Actually, it is Instance Segmentation, e.g. it only segments one person even there are 2 persons due to the bounding of the detection box.

I choose to read this paper because SDS has been used for comparisons in object detection, semantic segmentation and instance segmentation. I have not dived into details of SDS since it is quite an early paper in the aspect of CNN for instance segmentation tasks. To me, it is still better to know a bit about its architecture.