Photo by Carlos Navas on Unsplash

Improvements:

  • Region Proposal Network which can try and detect regions was added. Here the RPN is purely made of convolutional networks (3x3 followed by 2 1x1 for the next 2 tasks) and it does 2 things, it get’s the classfication (foreground or background) and region boxes (The prediction aren’t that good) and for doing this it uses 9 kinds of bounding boxes (different height, width).
  • The output information of the RPN is super imposed with feature maps.
  • So, basically we don’t use selective search and selective search would take 2–0.2 seconds, using RPN we take only 10ms.
  • We can use VGG-net or Alex-net

Architecture:

--

--

Photo by Djim Loic on Unsplash

check out rcnn: https://medium.com/@jackneutron786/rcnn-63085ae11d40

Improvements from RCNN:

  • Used multitask loss, that is the loss function is for both regression (detecting box) and classification.
  • Region of Interest (ROI) pooling layer.
  • Faster and more accurate when compared to RCNN
  • Used smoothen loss in bbox regression (if we have a and b. if |a-b|<1, then a,b = l1 |a-b| and if not then a,b = l2|a-b|²)
  • Used softmax for classification instead of svm

Architecture:

--

--

Photo by Avi Theret on Unsplash

From a high level:

  • It takes an input image.
  • Apply selective search on it and get region proposals.
  • Pass them into CNN architecture (for feature extraction).
  • The SVM is used for classification.

Details:

  • It uses selective serach to get search 2000 region proposals.
  • It uses SVM for classificaion and non-max supression.
  • linear regression to get…

--

--