Object Detection Using OpenCV

Hasti Sutaria

7 min readJan 31, 2022

Object Detection

Object detection is a computer vision technique that allows us to identify and locate objects in an image or video. With this kind of identification and localization, object detection can be used to count objects in a scene and determine and track their precise locations, all while accurately labeling them.
Object detection is commonly confused with image recognition. -> Image recognition assigns a label to an image. A picture of a car receives the label “dog”. A picture of two dogs, still receives the label “dog”. -> Object detection, on the other hand, draws a box around each dog and labels the box “dog”. The model predicts where each object is and what label should be applied. In that way, object detection provides more information about an image than recognition.

Object detection can be broken down into machine learning-based approaches and deep learning-based approaches.
In more traditional ML-based approaches, computer vision techniques are used to look at various features of an image, such as the color histogram or edges, to identify groups of pixels that may belong to an object. These features are then fed into a regression model that predicts the location of the object along with its label.
On the other hand, deep learning-based approaches employ convolutional neural networks (CNNs) to perform end-to-end, unsupervised object detection, in which features don’t need to be defined and extracted separately.

How does object detection work?

Deep learning-based object detection models typically have two parts. An encoder takes an image as input and runs it through a series of blocks and layers that learn to extract statistical features used to locate and label objects. Outputs from the encoder are then passed to a decoder, which predicts bounding boxes and labels for each object.
A number of popular object detection models belong to the R-CNN family. Over the years, they’ve become both more accurate and more computationally efficient. There are also a number of models that belong to the single shot detector family. MobileNet + SSD models feature a MobileNet-based encoder and the YOLO model features its own convolutional architecture.

Purpose

The main purpose of object detection is to identify and locate one or more effective targets from still image or video data. It comprehensively includes a variety of important techniques, such as image processing, pattern recognition, artificial intelligence and machine learning.
It has broad application prospects in such areas such as road traffic accident prevention [1], warnings of dangerous goods in factories, military restricted area monitoring and advanced human–computer interaction.
Since the application scenarios of multi-target detection in the real world are usually complex and variable, balancing the relationship between accuracy and computing costs is a difficult task.

Outline

In this project, we will detect the objects from a still image with the help of OpenCV library in Python. OpenCV library is widely known for image processing, object detection and has many real world applications.
After importing the necessary libraries, we would read the sample image, train the model, use the coco dataset for the pre-defined classes (objects) and as an outcome we would detect the object, it’s location, accuracy and index of the class (which helps us to identify the object).
Successfully in this project objects such as person, car, truck and traffic light are detected accurately from the image. Also the average accuracy of the model is greater than 60% which is fair enough.

Object detection using deep learning with OpenCV and Python

When it comes to object detection, popular detection frameworks are

YOLO
SSD
Faster R-CNN

Dependencies

opencv
numpy
matplotlib

pip install numpy opencv-python

YOLO (You Only Look Once)

Provided all the files are in the current directory :

Requirements : Visit here

Demonstration

Step 1 : Importing all the necessary libraries.

In [ ]:

#importing libraries
import cv2
import matplotlib.pyplot as plt
from matplotlib import ft2font
print("Libraries imported successfully!")Libraries imported successfully!

Step 2 : We need the configuration file to train the model and frozen model file. An object detection model is trained to detect the presence and location of multiple classes of objects. For example, a model might be trained with images that contain various pieces of fruit, along with a label that specifies the class of fruit they represent (e.g. an apple, a banana, or a strawberry), and data specifying where each object appears in the image.

In [ ]:

#importing and using necessary files
config_file='../input/objectdetection/ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt'
frozen_model='../input/objectdetection/frozen_inference_graph.pb'#Tenserflow object detection model
model = cv2.dnn_DetectionModel(frozen_model,config_file)

Step 3 : Coco dataset is the file consisting of all the object classes. So, in this step we will read the dataset and examine its information like number of classes and its labels.

In [ ]:

#Reading Coco dataset
classLabels=[]
filename='../input/objectdetection/yolov3.txt'
with open(filename,'rt') as fpt:
  classLabels = fpt.read().rstrip('\n').split('\n')print("Number of Classes")
print(len(classLabels))
print("Class labels")
print(classLabels)Number of Classes
80
Class labels
['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

Step 4 : Training the model.

Here, set the various parameters such as size, scaling, mean and color values of the input image. So, the parameters provided to the model will act as training parameters.

In [ ]:

#Model training
model.setInputSize(320,320)
model.setInputScale(1.0/127.5)
model.setInputMean((127.5,127.5,127.5))
model.setInputSwapRB(True)

Out[ ]:

<dnn_Model 0x7f88eaf93ed0>

Step 5 : To read the sample image using matplotlib.

In [ ]:

#reading image
img = cv2.imread('../input/image3/sample.jpg')
plt.imshow(img)<matplotlib.image.AxesImage at 0x7f88e8260b50>

Step 6 : To convert the img into RGB form using opencv from BGR format.

In [ ]:

#converting image from BGR to RGB
plt.imshow(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))

Out[ ]:

<matplotlib.image.AxesImage at 0x7f88e810a310>

Step 7 : Finally we are a step closer to detect the object. Hence, from the image with the help of our trained model we will retrive information such as class index (object), confidence (accuracy level) and bbox (location co-ordinates).

#object detection
ClassIndex, confidence, bbox = model.detect(img, confThreshold=0.5)

Step 8 : To print the accuracy rate.

In [ ]:

#fetching accuracy
print(confidence)[[0.7711565 ]
 [0.72707874]
 [0.7237637 ]
 [0.7212671 ]
 [0.6760945 ]
 [0.671836  ]
 [0.6363414 ]
 [0.60687923]
 [0.59556425]
 [0.595388  ]
 [0.5937964 ]
 [0.5856311 ]
 [0.57673174]
 [0.5689038 ]
 [0.56511927]
 [0.559943  ]
 [0.54572237]
 [0.5449435 ]
 [0.5358321 ]
 [0.52353   ]
 [0.52098787]
 [0.514513  ]
 [0.5125533 ]
 [0.50800025]
 [0.50565225]
 [0.50520414]]

Step 9 : To print the index of the class (object).

In [ ]:

#fetching object index
print(ClassIndex)[[ 3]
 [ 3]
 [ 3]
 [ 8]
 [ 1]
 [ 3]
 [ 3]
 [ 3]
 [ 3]
 [ 3]
 [10]
 [10]
 [10]
 [ 3]
 [ 3]
 [ 3]
 [10]
 [ 3]
 [10]
 [ 8]
 [ 3]
 [ 3]
 [ 3]
 [ 8]
 [ 3]
 [10]]

Step 10 : To print the axis points so that we can get the location of the object in the image.

In [ ]:

#fetching coordinates of boxes
print(bbox)[[   4  512  238  375]
 [1029  528  340  261]
 [  68  511  244  358]
 [ 572  296  335  398]
 [ 619  318  326  699]
 [ 316  520  222  206]
 [1270  499  130  114]
 [ 971  485  224  211]
 [ 181  532  150  279]
 [1405  499  144  154]
 [1491   81   40   68]
 [1039   73   40   96]
 [1145  319   29   59]
 [ 976  479  181  152]
 [ 949  486  127  161]
 [ 297  545  146  168]
 [ 135    0 1407  589]
 [ 478  528  100  151]
 [1488   46   39   64]
 [ 531  346  449  399]
 [ 276  528  196  213]
 [1384  502  189  201]
 [ 962  526   85  141]
 [ 957  439  235  192]
 [ 574  157 1013  595]
 [1031   44   42   99]]

Step 11. Finally! It's the time to plot the boxes around the object and identify the object by printing its name near the box. As a result, we will set the values for font, font scale, color of rectangle, font color, thickness of box and display the image with the detected objects.

In [ ]:

#plotting boxes
font_scale = 3
font = cv2.FONT_HERSHEY_PLAIN
for ClassInd, conf, boxes in zip(ClassIndex.flatten(), confidence.flatten(), bbox):
    cv2.rectangle(img, boxes, (0, 255, 0), 3)
    cv2.putText(img, classLabels[ClassInd-1], (boxes[0]+10, boxes[1]+40), font, fontScale=font_scale, color=(0, 0, 255), thickness=3)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))<matplotlib.image.AxesImage at 0x7f88e809bd90>

As shown above, objects such as person, car, truck and traffic light are detected accurately from the image. Also the average accuracy of the model is greater than 60% which is fair enough.

Thus, the object detection model using OpenCV library in python is applied successfully.

Before

After

Advantages

To locate and identity objects with box around it.
Real time detection
Simple architecture
High detection precision
Low misdetection rate

Disadvantages

Complex algorithms
Difficult for small and dense objects
Poor accuracy for identical objects

Applications

Video surveillance
Crowd counting
Person detection
Anomaly detection (i.e. in industries like agriculture, health care)
Self-driving cars
Security applications
Object tracking

Conclusion

So to summarize, object detection can impact our life in a more positive way than ever before. I hope the above overview was helpful in understanding the basics of object detection and how it can be used in the real world.

Object Detection Using OpenCV

Object Detection

How does object detection work?

Purpose

Outline

Object detection using deep learning with OpenCV and Python

Dependencies

YOLO (You Only Look Once)

Demonstration

Step 4 : Training the model.

Before

After

Advantages

Disadvantages

Applications

Conclusion

Written by Hasti Sutaria