REPORT 1: 4th Week.

Faiz Khan
3 min readJun 19, 2019

--

Project- Poor man’s rekognition

Proposal

Github

Mentor- Johannes Lochter

Organization- CCExtractor Development

Org admin- Carlos Fernandez

May 27 — June 1 = complete use-cases 1,2 and 3 (completed)

June 3 — June 15 = complete use-cases 4 (completed)

Within these 3 weeks of timeframe, I have completed 4 use-cases along with 3 articles. In my previous articles, I have already explained 3 use-cases so in this article I will only be explaining about the 4th use-case and that is Object detection.

Object Detection

Object Detection is the process of finding real-world object instances like car, bike, TV, flowers, and humans in still images or Videos. It allows for the recognition, localization, and detection of multiple objects within an image which provides us with a much better understanding of an image as a whole. It is commonly used in applications such as image retrieval, security, surveillance, and advanced driver assistance systems (ADAS).

I have performed this using YOLOv2 on an image and a video file. You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Titan X, it processes images at 40–90 FPS and has an mAP on VOC 2007 of 78.6% and an mAP of 48.1% on COCO test-dev. One can find all the details about YOLOv2 here:

Requirements:

  1. Python 3.5 or 3.6
  2. TensorFlow (GPU)
  3. OpenCV
  4. Darkflow repository (https://github.com/thtrieu/darkflow)
  5. Build a library
python setup.py build_ext --inplace

6. Download the weights file (https://pjreddie.com/darknet/yolov2/)

NOTE- I have downloaded YOLOv2 608x608, but one can download any version and try.

7. Create a bin folder within the darkflow-master folder.

8. Put the weights file in the bin folder.

Now jumping to the code:

For an image file

#import libraries
import cv2
from darkflow.net.build import TFNet
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg'# define the model options and runoptions = {
'model': 'cfg/yolo.cfg',
'load': 'bin/yolov2.weights',
'threshold': 0.3,
'gpu': 1.0
}
tfnet = TFNet(options)# read the color image and covert to RGBimg = cv2.imread('cat.jpg', cv2.IMREAD_COLOR)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# use YOLO to predict the image
result = tfnet.return_predict(img)
img.shape# pull out some info from the resultstl = (result[0]['topleft']['x'], result[0]['topleft']['y'])
br = (result[0]['bottomright']['x'], result[0]['bottomright']['y'])
label = result[0]['label']
# add the box and label and display it
img = cv2.rectangle(img, tl, br, (0, 255, 0), 7)
img = cv2.putText(img, label, tl, cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 0), 2)
plt.imshow(img)
plt.show()

Before Jumping into the video part, I have converted a downsampled video

This file removes frames from a video file. The resulting file will look faster when played back at normal speed. The idea is to create a video that can be processed by YOLO and look normal speed.

Video sample before downsampling.

https://vimeo.com/343181494

import cv2
import numpy as np
capture = cv2.VideoCapture('sample2.mp4')
size = (
int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
)
codec = cv2.VideoWriter_fourcc(*'DIVX')
output = cv2.VideoWriter('sample3.avi', codec, 60.0, size)
i = 0
frame_rate_divider = 3
while(capture.isOpened()):
ret, frame = capture.read()
if ret:
if i % frame_rate_divider == 0:
# frame = cv2.resize(frame, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_CUBIC)
output.write(frame)
cv2.imshow('frame', frame)
i += 1
else:
i += 1
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
capture.release()
output.release()
cv2.destroyAllWindows()

Video file after downsampling

https://vimeo.com/user99893962/review/343182483/afc61b07c9

Jumping into the most interesting part of the article.

#import libraries
import cv2
from darkflow.net.build import TFNet
import numpy as np
import time
# define the model options and runoption = {
'model': 'cfg/yolo.cfg',
'load': 'bin/yolov2.weights',
'threshold': 0.15,
'gpu': 1.0
}
tfnet = TFNet(option)capture = cv2.VideoCapture('sample3.avi')
colors = [tuple(255 * np.random.rand(3)) for i in range(5)]
size = (
int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
)
frame1 = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi',frame1, 20.0, size)
while (capture.isOpened()):
stime = time.time()
ret, frame = capture.read()
if ret:
results = tfnet.return_predict(frame)
for color, result in zip(colors, results):
tl = (result['topleft']['x'], result['topleft']['y'])
br = (result['bottomright']['x'], result['bottomright']['y'])
label = result['label']
frame = cv2.rectangle(frame, tl, br, color, 7)
frame = cv2.putText(frame, label, tl, cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 0), 2)

out.write(frame)
cv2.imshow('frame', frame)
print('FPS {:.1f}'.format(1 / (time.time() - stime)))
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
capture.release()
out.release()
cv2.destroyAllWindows()
break

Result

https://vimeo.com/343182873?ref=em-share

Writing this piece of beauty was the hardest part, python being most accurate writing language. Multiple times I was stuck thinking about the logic and parameters whereas the problem was just an extra tab.

--

--