Object Detection With OpenCV: Step by Step

Zaki Jefferson
Nov 21 · 11 min read
Image for post
Image for post

A few months ago I decided to create an Image Classification model using keras to detect threats such as firearms. I have now decided to extend this to object detection.

The purpose for a tool like this is to be able to detect objects in real time using a camera system.

Object Detection vs Image Classification

Before we began, I will slightly assume that you already know the difference between object detection and image classification is, but this will serve as a quick recap.

Image Classification is the process of using an image as your input through your model, and that model detects similarities in the given image, to have an output of your desired class. This will result in and output of your class name and the probability score.

Object Detection is the process of using an image and or video feed as your input through your model, and that model detects any objects. This can happen with many different object detection methods. This will result in an output of bounding boxes, class name, and probability score.

Moving Forward

There will be brief explanations on some of the methods that will be used, for I will not get into too much detail into certain methods because you can have many blogs on just one topic/method when it comes to object detection.

I will provide my process step by step, but I will not cover the already built neural network that I use, which was created from scratch. Maybe I will do another blog on the Neural Network that I created for this project.

The main goal of this blog and project is to show a very basic form of object detection using a real world dataset/problem.

Data Being Used

Total Number of Images: 3,000

Number of Classes: 3 : {“Assault Rifle”: 0, “Handgun”: 1, “No Firearm”: 2}

Preexisting Neural Network: Yes

Imports

Majority of the imports that I used are from tensorflow and keras. These libraries will help load my preexisting Convolutional Neural Network and process the images that will be used to pass through the object detection model.

OpenCV will be the library that will be used for object detection.

# Neural Network
from tensorflow.keras.applications import imagenet_utils
from keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from tensorflow.keras.models import load_model
import keras
import tensorflow as tf

Functions

I will list some pretty self explanatory functions that are used or can be used through out this project, and I will be giving explanations to the functions that have a direct link to object detection.

The first function is simply for displaying images using matplotlib:

def display_image(image_num):
"""
Prints out picture of
the image that is selected

After the else statement,
we try to use the file name
in oder to print out the image,

while the first image is used to
print the image directly without a
filename.
"""
try:
fig = plt.figure(figsize=(20, 15))
plt.grid(False)
plt.imshow(images[image_num])
except (RuntimeError, TypeError, NameError):
print("[INFO] Could not print image")
print("[INFO] trying something else...")
else:
print("[INFO] returning image...")
# Image path - getting images on file
image_paths = "Demo-Images/"
image_select = image_paths + ([f for f in listdir(image_paths) if isfile(join(image_paths, f))][image_num]) # Other way instead of the listdir function
img = plt.imread(fname=image_select)
plt.imshow(img)

Note: Your image_paths will depend on what you named your directory that you keep your images in.

The second function is used to predict your input image, giving you an output of your class names (Assault Rifle, Handgun, No Firearm) and the probability score:

# Prediction Function
def predict(model, image_num):
# Image path - getting images on file
image_paths = "Demo-Images/"
image_select = image_paths + ([f for f in listdir(image_paths) if isfile(join(image_paths, f))])[image_num] # Other way instead of the listdir function

img = load_img(image_select, target_size=(300, 300)) # Loading image
img = img_to_array(img) # Transforming image to array
img = img / 255 # Normalizing Image
img = np.expand_dims(img, axis=0) # Expanding dimensions
predict = cnn_model.predict(img) # Predicting the image
pred_name = classes[np.argmax(predict)] # Predicting the name
prediction = str(round(predict.max() * 100, 3))
print(display_image(image_num=image_num))
return prediction + '%', pred_name

Note: This is loading your preexisting Neural Network and giving you the results of your input image. Based on how you built your CNN, from scratch, you will have different values for the target_size parameter.

The third function fundamentally performs the same task as the function above, but with a little twist. The third function predicts region proposals/bounding boxes:

# Prediction Function
def predict_region_of_interest(model, proposals_used):
"""
predicts region proposals
"""
predict = model.predict(proposals_used) # Predicting the image
for proposals in predict:
pred_name = classes[np.argmax(proposals)] # Predicting the name
prediction = str(round(predict.max() * 100, 3))
print(pred_name)
return predict

The fourth function computes your IoU (Intersection over Union), which is essentially a performance measurement for our object detection model. IoU looks at the predicted bounding boxes/region of proposals that were found by your object detection method:

def compute_iou(boxA, boxB):
"""
IOU is a form of
performance measurement
for our object detector.
"""
# determine the (x, y)-coordinates of the intersection rectangle
xA = max(boxA[0], boxB[0])
yA = max(boxA[1], boxB[1])
xB = min(boxA[2], boxB[2])
yB = min(boxA[3], boxB[3])
# compute the area of intersection rectangle
interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
# compute the area of both the prediction and ground-truth
# rectangles
boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
# compute the intersection over union by taking the intersection
# area and dividing it by the sum of prediction + ground-truth
# areas - the intersection area
iou = interArea / float(boxAArea + boxBArea - interArea)
# return the intersection over union value
return iou

The fifth and final function is called Non Maximum Suppression (NMS) which cleans up your bounding boxes to return the highest probability bounding box:

Image for post
Image for post
#  Felzenszwalb et al.
def non_max_suppression(boxes, overlapThresh):
# if there are no boxes, return an empty list
if len(boxes) == 0:
return []
# initialize the list of picked indexes
pick = []
# grab the coordinates of the bounding boxes
x1 = boxes[:,0]
y1 = boxes[:,1]
x2 = boxes[:,2]
y2 = boxes[:,3]
# compute the area of the bounding boxes and sort the bounding
# boxes by the bottom-right y-coordinate of the bounding box
area = (x2 - x1 + 1) * (y2 - y1 + 1)
idxs = np.argsort(y2)

Selective Search

Now we can get into the topic of what makes your object detection run. The main model that we will be using in order to retrieve region proposals is Selective Search.

Selective Search is an automatic region proposal algorithm. It works by over-segmenting an image using a super-pixel algorithm, specifically known as Felzenszwalb’s Super-pixel algorithm. From there, Selective Search seeks to merge together the super-pixels to find the regions of an image that could contain an object.

Image for post
Image for post
# Setting a max amount of region proposals used when running selective search
max_proposals = 2_000
max_proposals_infer = 100 # Search for (1) gathering training data and (2) performing inference

For max_proposals_infer, feel free to allow more region proposals in your image to get better results.

This next code will load our preexisting Convolutional Neural Network:

# initialize the input dimensions to the network
input_dimensions = (300, 300) # 300 by 300 because that's what the CNN Model was tested on
# define the path to the output model
model_path = "model_3.hdf5"
cnn_model = keras.models.load_model(model_path) # Loading CNN model from keras

The code below will introduce the Selective Search class in our OpenCV library:

# initialize OpenCV's selective search implementation and set the
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()

In the next lines of code I am selecting an image from my chosen directory and setting it as a base image for our selective search algorithm can detect region proposals:

# Input image in selective search

I then use our function from above to display the image to see what we got:

Image for post
Image for post

In the code below we will be running the algorithm to get our regions in image 9:

# run selective search on the input image
start = time.time()
rects = ss.process() # Run Selective Search
end = time.time()

The code below will show us all the region proposals that the algorithm picked up:

# initialize the list of region proposals that we'll be classifying
# along with their associated bounding boxes
proposals = []
boxes = []
# loop over the region proposal bounding box coordinates generated by
# running selective search
for (x, y, w, h) in rects[:max_proposals_infer]:
# extract the region from the input image, convert it from BGR to
# RGB channel ordering, and then resize it to the required input
# dimensions of our trained CNN
roi = image[y:y + h, x:x + w]
roi = cv2.cvtColor(roi, cv2.COLOR_BGR2RGB)
roi = cv2.resize(roi, input_dimensions,
interpolation=cv2.INTER_CUBIC)
# further preprocess the ROI
roi = img_to_array(roi)
roi = preprocess_input(roi)
# update our proposals and bounding boxes lists
proposals.append(roi)
boxes.append((x, y, x + w, y + h))

The code below will show us the proposals and bounding boxes. I also used the predict_region_of_interest function to predict the region that has the closest probability of being the object we have in our class:

# convert the proposals and bounding boxes into NumPy arrays
proposals = np.array(proposals, dtype="float64")
boxes = np.array(boxes, dtype="int64")
print(f"[INFO] proposal shape: {proposals.shape}")
Image for post
Image for post

These next lines of code will filter out the predictions/bounding boxes that we want to see. It will show us the bounding box around our chosen image that has the highest/minimum probability that was set by us further up:

# Obtaining the label of the current prediction from the CNN
# Empty list to store proposals
proposal_name_list = []
for proposals in proba:
"""
For each predicted proposal
attach the class names and
append it to a list.
"""
pred_name = classes[np.argmax(proposals)]
proposal_name_list.append(pred_name)

Viewing Our Results

Now the final part is viewing our results. You can also use plt.imshow() to print the image in Jupyter Notebook.

The first few lines of codes will show you what the image looks like after our object detection model runs through the image without using our non-maximum suppression algorithm:

# clone the original image so that we can draw on it
clone = image.copy()
# loop over the bounding boxes and associated probabilities
for (box, prob) in zip(boxes, proba):
# draw the bounding box, label, and probability on the image
(startX, startY, endX, endY) = box
cv2.rectangle(clone, (startX, startY), (endX, endY),
(0, 255, 0), 2)
y = startY - 10 if startY - 10 > 10 else startY + 10
text = f"{np.round(prob * 100, decimals=3)}%"
cv2.putText(clone, text, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
# show the output after *before* running NMS
cv2.imshow("Before NMS", clone)
cv2.waitKey(0);
Image for post
Image for post

The next few lines of code will show us what the object detection algorithm does to the chosen image, including non-maximum suppression function, making our algorithm return a single bounding box:

# run non-maxima suppression on the bounding boxes
boxIdxs = non_max_suppression_fast(boxes=boxes, overlapThresh=0.5)
# loop over the bounding box indexes
for i in boxIdxs:
# draw the bounding box, label, and probability on the image
(startX, startY, endX, endY) = i # or boxes[0] will return 1 bb
cv2.rectangle(image, (startX, startY), (endX, endY),
(0, 255, 0), 2)
y = startY - 10 if startY - 10 > 10 else startY + 10
text = f"{classes[np.argmax(prob)]}: {np.round(proba.max() * 100, decimals=1)}%"
cv2.putText(image, text, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
# show the output image *after* running NMS
cv2.imshow("After NMS", image)
cv2.waitKey(0);
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Zaki Jefferson

Written by

Data Scientist | Independent Contractor

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Zaki Jefferson

Written by

Data Scientist | Independent Contractor

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store