RCNN
May 1, 2022
From a high level:
- It takes an input image.
- Apply selective search on it and get region proposals.
- Pass them into CNN architecture (for feature extraction).
- The SVM is used for classification.
Details:
- It uses selective serach to get search 2000 region proposals.
- It uses SVM for classificaion and non-max supression.
- linear regression to get bounding boxes
- The regions are wrapped into the input shape of alexnet (used to extract features).
Architecture:
It uses the concepts we talked about above.
Con’s:
- Extracting 2000 region proposals is hard, and after that if we pass them into cnn then we have a bottle neck.
- Extracting feature for every region proposal
- Takes 40–50 seconds for prediction.