Object Detection With OpenCV: Step by Step

Zaki Jefferson
Analytics Vidhya
Published in
11 min readNov 21, 2020

A few months ago I decided to create an Image Classification model using keras to detect threats such as firearms. I have now decided to extend this to object detection.

The purpose for a tool like this is to be able to detect objects in real time using a camera system.

Object Detection vs Image Classification

Before we began, I will slightly assume that you already know the difference between object detection and image classification is, but this will serve as a quick recap.

Image Classification is the process of using an image as your input through your model, and that model detects similarities in the given image, to have an output of your desired class. This will result in and output of your class name and the probability score.

Object Detection is the process of using an image and or video feed as your input through your model, and that model detects any objects. This can happen with many different object detection methods. This will result in an output of bounding boxes, class name, and probability score.

Moving Forward

There will be brief explanations on some of the methods that will be used, for I will not get into too much detail into certain methods because you can have many blogs on just one topic/method when it comes to object detection.

I will provide my process step by step, but I will not cover the already built neural network that I use, which was created from scratch. Maybe I will do another blog on the Neural Network that I created for this project.

The main goal of this blog and project is to show a very basic form of object detection using a real world dataset/problem.

Data Being Used

Total Number of Images: 3,000

Number of Classes: 3 : {“Assault Rifle”: 0, “Handgun”: 1, “No Firearm”: 2}

Preexisting Neural Network: Yes

Imports

Majority of the imports that I used are from tensorflow and keras. These libraries will help load my preexisting Convolutional Neural Network and process the images that will be used to pass through the object detection model.

OpenCV will be the library that will be used for object detection.

# Neural Network
from tensorflow.keras.applications import imagenet_utils
from keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from tensorflow.keras.models import load_model
import keras
import tensorflow as tf
# For measuring the inference time.
import time
import random
# Computer Vision
import cv2
# Graphing
import matplotlib.pyplot as plt
import seaborn as sns
# Math
import numpy as np
# File handling
import pickle
from os import listdir
from os.path import isfile, join
import os

Functions

I will list some pretty self explanatory functions that are used or can be used through out this project, and I will be giving explanations to the functions that have a direct link to object detection.

The first function is simply for displaying images using matplotlib:

def display_image(image_num):
"""
Prints out picture of
the image that is selected

After the else statement,
we try to use the file name
in oder to print out the image,

while the first image is used to
print the image directly without a
filename.
"""
try:
fig = plt.figure(figsize=(20, 15))
plt.grid(False)
plt.imshow(images[image_num])
except (RuntimeError, TypeError, NameError):
print("[INFO] Could not print image")
print("[INFO] trying something else...")
else:
print("[INFO] returning image...")
# Image path - getting images on file
image_paths = "Demo-Images/"
image_select = image_paths + ([f for f in listdir(image_paths) if isfile(join(image_paths, f))][image_num]) # Other way instead of the listdir function
img = plt.imread(fname=image_select)
plt.imshow(img)

Note: Your image_paths will depend on what you named your directory that you keep your images in.

The second function is used to predict your input image, giving you an output of your class names (Assault Rifle, Handgun, No Firearm) and the probability score:

# Prediction Function
def predict(model, image_num):
# Image path - getting images on file
image_paths = "Demo-Images/"
image_select = image_paths + ([f for f in listdir(image_paths) if isfile(join(image_paths, f))])[image_num] # Other way instead of the listdir function

img = load_img(image_select, target_size=(300, 300)) # Loading image
img = img_to_array(img) # Transforming image to array
img = img / 255 # Normalizing Image
img = np.expand_dims(img, axis=0) # Expanding dimensions
predict = cnn_model.predict(img) # Predicting the image
pred_name = classes[np.argmax(predict)] # Predicting the name
prediction = str(round(predict.max() * 100, 3))
print(display_image(image_num=image_num))
return prediction + '%', pred_name

Note: This is loading your preexisting Neural Network and giving you the results of your input image. Based on how you built your CNN, from scratch, you will have different values for the target_size parameter.

The third function fundamentally performs the same task as the function above, but with a little twist. The third function predicts region proposals/bounding boxes:

# Prediction Function
def predict_region_of_interest(model, proposals_used):
"""
predicts region proposals
"""
predict = model.predict(proposals_used) # Predicting the image
for proposals in predict:
pred_name = classes[np.argmax(proposals)] # Predicting the name
prediction = str(round(predict.max() * 100, 3))
print(pred_name)
return predict

The fourth function computes your IoU (Intersection over Union), which is essentially a performance measurement for our object detection model. IoU looks at the predicted bounding boxes/region of proposals that were found by your object detection method:

def compute_iou(boxA, boxB):
"""
IOU is a form of
performance measurement
for our object detector.
"""
# determine the (x, y)-coordinates of the intersection rectangle
xA = max(boxA[0], boxB[0])
yA = max(boxA[1], boxB[1])
xB = min(boxA[2], boxB[2])
yB = min(boxA[3], boxB[3])
# compute the area of intersection rectangle
interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
# compute the area of both the prediction and ground-truth
# rectangles
boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
# compute the intersection over union by taking the intersection
# area and dividing it by the sum of prediction + ground-truth
# areas - the intersection area
iou = interArea / float(boxAArea + boxBArea - interArea)
# return the intersection over union value
return iou

The fifth and final function is called Non Maximum Suppression (NMS) which cleans up your bounding boxes to return the highest probability bounding box:

#  Felzenszwalb et al.
def non_max_suppression(boxes, overlapThresh):
# if there are no boxes, return an empty list
if len(boxes) == 0:
return []
# initialize the list of picked indexes
pick = []
# grab the coordinates of the bounding boxes
x1 = boxes[:,0]
y1 = boxes[:,1]
x2 = boxes[:,2]
y2 = boxes[:,3]
# compute the area of the bounding boxes and sort the bounding
# boxes by the bottom-right y-coordinate of the bounding box
area = (x2 - x1 + 1) * (y2 - y1 + 1)
idxs = np.argsort(y2)
# keep looping while some indexes still remain in the indexes
# list
while len(idxs) > 0:
# grab the last index in the indexes list, add the index
# value to the list of picked indexes, then initialize
# the suppression list (i.e. indexes that will be deleted)
# using the last index
last = len(idxs) - 1
i = idxs[last]
pick.append(i)
suppress = [last]
# loop over all indexes in the indexes list
for pos in range(0, last):
# grab the current index
j = idxs[pos]
# find the largest (x, y) coordinates for the start of
# the bounding box and the smallest (x, y) coordinates
# for the end of the bounding box
xx1 = max(x1[i], x1[j])
yy1 = max(y1[i], y1[j])
xx2 = min(x2[i], x2[j])
yy2 = min(y2[i], y2[j])
# compute the width and height of the bounding box
w = max(0, xx2 - xx1 + 1)
h = max(0, yy2 - yy1 + 1)
# compute the ratio of overlap between the computed
# bounding box and the bounding box in the area list
overlap = float(w * h) / area[j]
# if there is sufficient overlap, suppress the
# current bounding box
if overlap > overlapThresh:
suppress.append(pos)
# delete all indexes from the index list that are in the
# suppression list
idxs = np.delete(idxs, suppress)
# return only the bounding boxes that were picked
return boxes[pick]

Selective Search

Now we can get into the topic of what makes your object detection run. The main model that we will be using in order to retrieve region proposals is Selective Search.

Selective Search is an automatic region proposal algorithm. It works by over-segmenting an image using a super-pixel algorithm, specifically known as Felzenszwalb’s Super-pixel algorithm. From there, Selective Search seeks to merge together the super-pixels to find the regions of an image that could contain an object.

# Setting a max amount of region proposals used when running selective search
max_proposals = 2_000
max_proposals_infer = 100 # Search for (1) gathering training data and (2) performing inference

For max_proposals_infer, feel free to allow more region proposals in your image to get better results.

This next code will load our preexisting Convolutional Neural Network:

# initialize the input dimensions to the network
input_dimensions = (300, 300) # 300 by 300 because that's what the CNN Model was tested on
# define the path to the output model
model_path = "model_3.hdf5"
cnn_model = keras.models.load_model(model_path) # Loading CNN model from keras
# define the minimum probability required for a positive prediction
# (used to filter out false-positive predictions)
min_probability = 0.90

The code below will introduce the Selective Search class in our OpenCV library:

# initialize OpenCV's selective search implementation and set the
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()

In the next lines of code I am selecting an image from my chosen directory and setting it as a base image for our selective search algorithm can detect region proposals:

# Input image in selective search# Image path - getting images on file
image_num = 232
image_paths = "Demo-Images/"
image_select = image_paths + ([f for f in listdir(image_paths) if isfile(join(image_paths, f))][image_num]) # Other way instead of the listdir function
# Making image compatible with imshow
image = cv2.imread(image_select)
# load the input image (300x300) and preprocess it
image = cv2.resize(image, input_dimensions) # Increasing image means more regions
# Setting base image that will be used
ss.setBaseImage(image)
# Choosing which selective search
ss.switchToSelectiveSearchQuality()

I then use our function from above to display the image to see what we got:

In the code below we will be running the algorithm to get our regions in image 9:

# run selective search on the input image
start = time.time()
rects = ss.process() # Run Selective Search
end = time.time()
# show how along selective search took to run along with the total
# number of returned region proposals
print(f"[INFO] selective search took {np.round(end - start, decimals=3)} seconds")
print(f"[INFO] {len(rects)} total region proposals")

The code below will show us all the region proposals that the algorithm picked up:

# initialize the list of region proposals that we'll be classifying
# along with their associated bounding boxes
proposals = []
boxes = []
# loop over the region proposal bounding box coordinates generated by
# running selective search
for (x, y, w, h) in rects[:max_proposals_infer]:
# extract the region from the input image, convert it from BGR to
# RGB channel ordering, and then resize it to the required input
# dimensions of our trained CNN
roi = image[y:y + h, x:x + w]
roi = cv2.cvtColor(roi, cv2.COLOR_BGR2RGB)
roi = cv2.resize(roi, input_dimensions,
interpolation=cv2.INTER_CUBIC)
# further preprocess the ROI
roi = img_to_array(roi)
roi = preprocess_input(roi)
# update our proposals and bounding boxes lists
proposals.append(roi)
boxes.append((x, y, x + w, y + h))

The code below will show us the proposals and bounding boxes. I also used the predict_region_of_interest function to predict the region that has the closest probability of being the object we have in our class:

# convert the proposals and bounding boxes into NumPy arrays
proposals = np.array(proposals, dtype="float64")
boxes = np.array(boxes, dtype="int64")
print(f"[INFO] proposal shape: {proposals.shape}")
# classify each of the proposal ROIs using fine-tuned model
print("[INFO] classifying proposals...")
proba = predict_region_of_proposals(model=cnn_model, proposals_used=proposals) # Predicting the proposals for our desired object
# Result: 100 proposals 300 by 300 image with RGB color
# Probabilty of each proposal (Region of Proposals)
print(f"[INFO] Probabiltiy Scores: {proba}")

These next lines of code will filter out the predictions/bounding boxes that we want to see. It will show us the bounding box around our chosen image that has the highest/minimum probability that was set by us further up:

# Obtaining the label of the current prediction from the CNN
# Empty list to store proposals
proposal_name_list = []
for proposals in proba:
"""
For each predicted proposal
attach the class names and
append it to a list.
"""
pred_name = classes[np.argmax(proposals)]
proposal_name_list.append(pred_name)
# find the index of all predictions that are greater
# than the minimum probability
print("[INFO] applying NMS...")
# Find the indexs where the main prediction label matches the overall image
# Get the index of the proposal that has the same class name as the overall image
idxs = [i for i, x in enumerate(proposal_name_list) if x == pred_name]
boxes = boxes[idxs]
proba = proba[idxs]
# further filter indexes by enforcing a minimum prediction
idxs = np.where(proba >= min_probability)[0]
# probability be met
boxes = boxes[idxs]
proba = proba[idxs]

Viewing Our Results

Now the final part is viewing our results. You can also use plt.imshow() to print the image in Jupyter Notebook.

The first few lines of codes will show you what the image looks like after our object detection model runs through the image without using our non-maximum suppression algorithm:

# clone the original image so that we can draw on it
clone = image.copy()
# loop over the bounding boxes and associated probabilities
for (box, prob) in zip(boxes, proba):
# draw the bounding box, label, and probability on the image
(startX, startY, endX, endY) = box
cv2.rectangle(clone, (startX, startY), (endX, endY),
(0, 255, 0), 2)
y = startY - 10 if startY - 10 > 10 else startY + 10
text = f"{np.round(prob * 100, decimals=3)}%"
cv2.putText(clone, text, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
# show the output after *before* running NMS
cv2.imshow("Before NMS", clone)
cv2.waitKey(0);

The next few lines of code will show us what the object detection algorithm does to the chosen image, including non-maximum suppression function, making our algorithm return a single bounding box:

# run non-maxima suppression on the bounding boxes
boxIdxs = non_max_suppression_fast(boxes=boxes, overlapThresh=0.5)
# loop over the bounding box indexes
for i in boxIdxs:
# draw the bounding box, label, and probability on the image
(startX, startY, endX, endY) = i # or boxes[0] will return 1 bb
cv2.rectangle(image, (startX, startY), (endX, endY),
(0, 255, 0), 2)
y = startY - 10 if startY - 10 > 10 else startY + 10
text = f"{classes[np.argmax(prob)]}: {np.round(proba.max() * 100, decimals=1)}%"
cv2.putText(image, text, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
# show the output image *after* running NMS
cv2.imshow("After NMS", image)
cv2.waitKey(0);

--

--

Zaki Jefferson
Analytics Vidhya

Data Scientist | Data Science Consultant. I work with companies and individuals to help leverage the abundance of data to help grow their ideas and business!