RCNN

Syed Rahim Saqib

May 1, 2022

--

Photo by Avi Theret on Unsplash

From a high level:

It takes an input image.
Apply selective search on it and get region proposals.
Pass them into CNN architecture (for feature extraction).
The SVM is used for classification.

Details:

It uses selective serach to get search 2000 region proposals.
It uses SVM for classificaion and non-max supression.
linear regression to get bounding boxes
The regions are wrapped into the input shape of alexnet (used to extract features).

Architecture:

HIgh level architecture

It uses the concepts we talked about above.

Con’s:

Extracting 2000 region proposals is hard, and after that if we pass them into cnn then we have a bottle neck.
Extracting feature for every region proposal
Takes 40–50 seconds for prediction.

Computer Vision

Syed Rahim Saqib

Written by Syed Rahim Saqib

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams