Real-Time Object detection on a CPU using one of the smartest Convolutional Neural Networks — The YOLO

4 min readSep 23, 2019

Hi, I’m Emmanuel, I have something very exciting for you today.

Ever heard of Computer Vision. Its a branch of artificial intelligence that deals with the programming of a computer or device using special machine learning, deep learning and data science algorithms and techniques, to enable it understand visual data from cameras etc. And by understanding this data, it can mimic the human behaviour of responding to graphic stimuli in a more intelligent way.

This tutorial requires some level of python programming knowledge and an understanding of concepts like machine learning, convolutional and deep neural networks.

We’re going to learn in this tutorial how to detect objects in real time running YOLO on a CPU.

If you’re a complete beginner about YOLO I highly suggest to check out other tutorials about YOLO object detection on images, before proceding with realtime detection, as I’m going to use a similar code here.

Why did I specify that we’re going to perform the detection using the CPU?

I did specify this as with the deep learning frameworks it’s possible to do the detection using the CPU or the GPU.

YOLO on CPU vs YOLO on GPU?

I’m going to quickly to compare yolo on a cpu versus yolo on the GPU explaining advantages and constraints for both of them.

YOLO on CPU

The big advantage of running YOLO on the CPU is that it’s really easy to set up and it works right away on Opencv withouth doing any further installations. You only need Opencv 3.4.2 or greater.

The constraint is that YOLO, as any deep neural network runs really slow on a CPU and we will be able to process only a few frames per second.
Not really good for a realtime detection.

YOLO on GPU

Instead YOLO on a GPU is really fast, and with a good GPU you can process 45 or more frames per seconds.
So we’re not talking about a small speed difference, but a huge difference where the GPU greatly outperforms the CPU by 20 times faster or more.

The constraint is that for a beginner setting up a deep neural network on a GPU can be a really harsh process.
Also it doesn’t work with all the GPUs but only with NVIDIA GPUs wich are compatible with CUDA.

Now lets get down to business.

We import the libraries and we load the Network.

import cv2
import numpy as np
import time
# Load Yolo
net = cv2.dnn.readNet("weights/yolov3-tiny.weights", "cfg/yolov3-tiny.cfg")
classes = []
with open("coco.names", "r") as f:
  classes = [line.strip() for line in f.readlines()]
  layer_names = net.getLayerNames()
  output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
  colors = np.random.uniform(0, 255, size=(len(classes), 3))

We then load the the camera.
We get the starting time and the frame ID in order to calculate later how many frames per second FPS we are processing.

# Loading camera
cap = cv2.VideoCapture(0)
font = cv2.FONT_HERSHEY_PLAIN
starting_time = time.time()
frame_id = 0

We run the while loop and we extract the frame from the camera.

while True:
  _, frame = cap.read()
  frame_id += 1
  height, width, channels = frame.shape

We perform the detection.

# Detecting objects
blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)
# Showing informations on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
  for detection in out:
    scores = detection[5:]
    class_id = np.argmax(scores)
    confidence = scores[class_id]
    if confidence > 0.2:
      # Object detected
      center_x = int(detection[0] * width)
      center_y = int(detection[1] * height)
      w = int(detection[2] * width)
      h = int(detection[3] * height)
      # Rectangle coordinates
      x = int(center_x - w / 2)
      y = int(center_y - h / 2)
      boxes.append([x, y, w, h])
      confidences.append(float(confidence))
      class_ids.append(class_id)
      indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.4, 0.3)
for i in range(len(boxes)):
  if i in indexes:
    x, y, w, h = boxes[i]
    label = str(classes[class_ids[i]])
    confidence = confidences[i]
    color = colors[class_ids[i]]
    cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
    cv2.rectangle(frame, (x, y), (x + w, y + 30), color, -1)
    cv2.putText(frame, label + " " + str(round(confidence, 2)), (x, y + 30), font, 3, (255,255,255), 3)

We then calculate the FPS by deviding the elapsed time by the number of the frames and we show everything on the screen.

elapsed_time = time.time() - starting_time
fps = frame_id / elapsed_time
cv2.putText(frame, "FPS: " + str(round(fps, 2)), (10, 50), font, 3, (0, 0, 0), 3)
cv2.imshow("Image", frame)
key = cv2.waitKey(1)
if key == 27:
  break
cap.release()
cv2.destroyAllWindows()

Here is an example of what it looks like:

To use this code, clone this project from my github repo, install the necessary dependencies and run the code script on your command line interface.

I hope you find this piece very enlightening.

Stay tuned for more AI/computer vision content.

Also check out my Youtube channel for more tutorials.

Have a Nice Day

Real-Time Object detection on a CPU using one of the smartest Convolutional Neural Networks — The YOLO

YOLO on CPU vs YOLO on GPU?

Written by Emmytheo 24/7

Responses (1)