Towards a real-time vehicle detection: SSD multibox approach
Vivek Yadav

I have some question on SSD:

1-During Prediction, how is the process actually applied. would each bounding box with different aspect ratio be allowed to output a prediction and an offset?

2-How actually is the convolution layer producing 5 outputs exactly (4 offset and 1 confidence)?

3-How is the scale related to the dimension of the feature map , would I apply bounding boxes with different scales on same Feature map?

Thanks a lot

