Project- Poor man’s rekognition
Mentor- Johannes Lochter
Organization- CCExtractor Development
Org admin- Carlos Fernandez
May 27 — June 1 = complete use-cases 1,2 and 3 (completed)
June 3 — June 15 = complete use-cases 4 (completed)
Within these 3 weeks of timeframe, I have completed 4 use-cases along with 3 articles. In my previous articles, I have already explained 3 use-cases so in this article I will only be explaining about the 4th use-case and that is Object detection.
Object Detection
Object Detection is the process of finding real-world object instances like car, bike, TV, flowers, and humans in still images or Videos. It allows for the recognition, localization, and detection of multiple objects within an image which provides us with a much better understanding of an image as a whole. It is commonly used in applications such as image retrieval, security, surveillance, and advanced driver assistance systems (ADAS).
I have performed this using YOLOv2 on an image and a video file. You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Titan X, it processes images at 40–90 FPS and has an mAP on VOC 2007 of 78.6% and an mAP of 48.1% on COCO test-dev. One can find all the details about YOLOv2 here:
Requirements:
- Python 3.5 or 3.6
- TensorFlow (GPU)
- OpenCV
- Darkflow repository (https://github.com/thtrieu/darkflow)
- Build a library
python setup.py build_ext --inplace
6. Download the weights file (https://pjreddie.com/darknet/yolov2/)
NOTE- I have downloaded YOLOv2 608x608, but one can download any version and try.
7. Create a bin
folder within the darkflow-master
folder.
8. Put the weights file in the bin folder.
Now jumping to the code:
For an image file
#import libraries
import cv2
from darkflow.net.build import TFNet
import matplotlib.pyplot as plt%config InlineBackend.figure_format = 'svg'# define the model options and runoptions = {
'model': 'cfg/yolo.cfg',
'load': 'bin/yolov2.weights',
'threshold': 0.3,
'gpu': 1.0
}tfnet = TFNet(options)# read the color image and covert to RGBimg = cv2.imread('cat.jpg', cv2.IMREAD_COLOR)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)# use YOLO to predict the image
result = tfnet.return_predict(img)img.shape# pull out some info from the resultstl = (result[0]['topleft']['x'], result[0]['topleft']['y'])
br = (result[0]['bottomright']['x'], result[0]['bottomright']['y'])
label = result[0]['label']# add the box and label and display it
img = cv2.rectangle(img, tl, br, (0, 255, 0), 7)
img = cv2.putText(img, label, tl, cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 0), 2)
plt.imshow(img)
plt.show()
Before Jumping into the video part, I have converted a downsampled video
This file removes frames from a video file. The resulting file will look faster when played back at normal speed. The idea is to create a video that can be processed by YOLO and look normal speed.
Video sample before downsampling.
import cv2
import numpy as npcapture = cv2.VideoCapture('sample2.mp4')
size = (
int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
)
codec = cv2.VideoWriter_fourcc(*'DIVX')
output = cv2.VideoWriter('sample3.avi', codec, 60.0, size)i = 0
frame_rate_divider = 3
while(capture.isOpened()):
ret, frame = capture.read()
if ret:
if i % frame_rate_divider == 0:
# frame = cv2.resize(frame, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_CUBIC)
output.write(frame)
cv2.imshow('frame', frame)
i += 1
else:
i += 1
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
breakcapture.release()
output.release()
cv2.destroyAllWindows()
Video file after downsampling
https://vimeo.com/user99893962/review/343182483/afc61b07c9
Jumping into the most interesting part of the article.
#import libraries
import cv2
from darkflow.net.build import TFNet
import numpy as np
import time# define the model options and runoption = {
'model': 'cfg/yolo.cfg',
'load': 'bin/yolov2.weights',
'threshold': 0.15,
'gpu': 1.0
}tfnet = TFNet(option)capture = cv2.VideoCapture('sample3.avi')
colors = [tuple(255 * np.random.rand(3)) for i in range(5)]
size = (
int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
)
frame1 = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi',frame1, 20.0, size)while (capture.isOpened()):
stime = time.time()
ret, frame = capture.read()
if ret:
results = tfnet.return_predict(frame)
for color, result in zip(colors, results):
tl = (result['topleft']['x'], result['topleft']['y'])
br = (result['bottomright']['x'], result['bottomright']['y'])
label = result['label']
frame = cv2.rectangle(frame, tl, br, color, 7)
frame = cv2.putText(frame, label, tl, cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 0), 2)
out.write(frame)
cv2.imshow('frame', frame)print('FPS {:.1f}'.format(1 / (time.time() - stime)))
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
capture.release()
out.release()
cv2.destroyAllWindows()
break
Result
https://vimeo.com/343182873?ref=em-share
Writing this piece of beauty was the hardest part, python being most accurate writing language. Multiple times I was stuck thinking about the logic and parameters whereas the problem was just an extra tab.