You Only Live Once or You Only Look Once?

Harika Naishadham
IEEE Women In Engineering , VIT
6 min readSep 28, 2021

As a generation of people owning the internet era, I’m sure you would have used the acronym ‘YOLO’ at least once. Owning to the high spirits of this acronym, there has been a breakthrough in the object detection realm with the introduction of the ‘YOLO’ algorithm. Working with the ‘YOLO’ algorithm is as fun as using the internet acronym ‘YOLO’.

YOLO (You Only Look Once) is basically a subset of Object Detection which is one of the several branches of Data Science and Computer Vision. Before we get started, let’s begin to understand what YOLO is all about by taking a few examples. When we use a camera, an algorithm is being run to detect objects and ensure the camera lens’s focus is on the object. We can also take the example of self-driving cars. They are programmed to locate obstacles such as other vehicles, speed-breakers, etc., and help them to work more precisely.

Well, these are just a few applications of Object Detection.

Object Detection goes hand-in-hand with Image Classification and Object Localization.

  • Image Classification is a process where the image captured by a device is first classified as a car, dog, cat, human, etc., which determines what the picture is.
  • Object Localization is the process performed next which involves the location of the object in the image.
  • Object Detection is the final process that enables the drawing of boundary boxes around the previously located object and thus detects the object.

What is YOLO?

YOLO is a convolutional neural network(CNN) for performing object detection in real-time. A single neural network is applied to the entire image which divides the image into regions, and predicts the bounding boxes and probabilities for each region. YOLO rose to popularity because it is capable of achieving high accuracy while also being run in real-time. As its abbreviation ‘You Only Look Once’ suggests, the algorithm “only looks once” at the image i.e, it requires only a single push forward through the neural network to make predictions. It then makes sure that the algorithm detects each object “only once” and then outputs the recognized objects together with bounding boxes.

To understand the YOLO algorithm better, let us understand the different types of algorithms. They are divided into two-

  1. Algorithms based on Classification

Such algorithms function in two stages. First, the regions are selected and then are classified using Convolution Neural Networks by running predictions. Ex: Region-based CNN(RCNN), Fast-RCNN, RetinaNet, etc.

2. Algorithms based on Regression

Such algorithms function in a single step where the image is screened and objects are located with bounding boxes all in the same run. Ex.: YOLO(You Only Look Once) and SSD(Single Shot Multibox Detector).

Hence, YOLO is a regression-based algorithm for real-time detection where time, speed, and accuracy are the primary concerns.

The working of the YOLO algorithm:

YOLO follows three main techniques and they are:

  • Residual Blocks
  • Bounding Boxes Regression
  • Intersection Over Union(IOU)
  • Let’s look into each one in detail.
  1. Residual Blocks

Initially, grids of dimensions SxS are made and the input image is divided into these grids. The grid cells which are of equal dimensions detect objects that appear within the cells. For instance, if an object appears within a particular grid cell, this particular cell is going to be liable for detecting the thing.

2. Bounding Box Regression

For starters, a bounding box is simply an outline that highlights the object in an image.

The following attributes are included in every bounding box of the image:

  • Width (bw)
  • Height (bh)
  • Class (such as car, person, animal, etc.)-Represented by the letter c.
  • Bounding box center (bx,by)

YOLO uses a single bounding box regression to predict the height, width, center, and class of objects.

From the image above, we can infer that the probability of an object appearing in the bounding box is given by y=(pc,bx,by,bh,bw,c)

3. Intersection Over Union(IOU)

IOU is defined as an occurrence in object detection that describes how boxes overlap. With IOU, an output box is provided that surrounds that object precisely. Each grid cell is in charge of predicting the bounding boxes. If the predicted bounding box is the same as the real box, then IOU is equal to 1. IOU is in fact a procedure to remove bounding boxes that are not equal to the real box.

From the image above, we see that there are two bounding boxes, one in blue and the other in green. The blue box is the predicted box whereas the green box is the real box. YOLO checks that the two bounding boxes are equal in real-time.

By combining the above three processes, we get the final detection results. The image below is an example of how the output is shown.

To summarise the process, the image is first divided into grid cells of SxS dimension. Prediction of their bounding boxes and their confidence scores is done using which the class probabilities are found out to set the class of each object.

In the above image, we see that there are three classes of objects i.e, a dog, a bicycle and a car. We use a single convolutional neural network to make all the predictions simultaneously.

Using IOU, we eliminate any other unnecessary bounding boxes that do not match the dimensions of the objects such as height and width. The final detection is made such that the bounding boxes fit the object accurately.

As seen in the image above, the dog has a bounding box of blue color. The car has a pink bounding box and the bicycle has a yellow bounding box. This shows that the YOLO algorithm has run simulations accurately and precisely.

Think about how easy our life would be if we simply used a preexisting framework, executed it, and got the desired result? Minimum effort, maximum reward. Isn’t that what we all strive for? Throughout this article, we have seen that YOLO has its benefits over other algorithms as it is much faster and accurate. The real-life applications of YOLO spread from autonomous driving to detection of wildlife species to security and much more. YOLO is also able to make generalized representations of objects. YOLO is thus truly a state-of-art algorithm.

References:

https://www.latentview.com/blog/real-time-object-detection-with-yolo/

https://heartbeat.fritz.ai/introduction-to-basic-object-detection-algorithms-b77295a95a63

https://www.geeksforgeeks.org/yolo-you-only-look-once-real-time-object-detection/

https://pjreddie.com/darknet/yolo/

--

--

Harika Naishadham
IEEE Women In Engineering , VIT

Passionate about tech, machine learning and inclusion and diversity of women in tech | Coffee enthusiast | keeping my passion for writing alive