RCNN

--

Photo by Avi Theret on Unsplash

From a high level:

  • It takes an input image.
  • Apply selective search on it and get region proposals.
  • Pass them into CNN architecture (for feature extraction).
  • The SVM is used for classification.

Details:

  • It uses selective serach to get search 2000 region proposals.
  • It uses SVM for classificaion and non-max supression.
  • linear regression to get bounding boxes
  • The regions are wrapped into the input shape of alexnet (used to extract features).

Architecture:

HIgh level architecture

It uses the concepts we talked about above.

Con’s:

  • Extracting 2000 region proposals is hard, and after that if we pass them into cnn then we have a bottle neck.
  • Extracting feature for every region proposal
  • Takes 40–50 seconds for prediction.

--

--