YAT - An open-source data annotation tool for YOLO

Vinay
6 min readJul 12, 2019

--

[Update: Check out the latest python-based YOLO annotation tool — PyYAT here: https://github.com/2vin/PyYAT]

If you are familiar with object detection and deep learning, then you must be knowing about the importance of YOLO in this field. It has enabled researchers to train and test object detection quickly and efficiently in their respective works. However, I found that few of the beginners still feel hesitant to train YOLO on their own dataset due to limited knowledge of data annotation in proper format.

Thus, I have open-sourced my data annotation toolbox for YOLO so that the researchers and students can use it to build innovative projects without any limitations. This toolbox, named Yolo Annotation Tool (YAT), can be used to annotate data directly into the format required by YOLO. All you need is to create a label file containing all the class names to be trained.

Table of Contents

  1. What is YOLO and Why is it Useful in Object Detection?
  2. How does the YOLO works?
  3. How to annotate Bounding Boxes?
  4. How our toolbox — YAT works?
  5. Screenshot of YAT

What is YOLO and Why is it Useful in Object Detection?

The YOLO framework (You Only Look Once) is a deep learning framework for object detection. It takes the entire image in a single instance and predicts the bounding box rectangle and class confidence for respective boxes. YOLOv3 is extremely fast and accurate. In mAP measured at .5 IOU YOLOv3 is on par with Focal Loss but about 4x faster. This is the link to the original paper : https://pjreddie.com/media/files/papers/YOLOv3.pdf.

This is one of the best algorithms for object detection and has shown a comparatively similar performance to the different algorithms.

Source: https://pjreddie.com/darknet/yolo/

YOLO model has several advantages over classifier-based systems. It looks at the whole image at test time so its predictions are informed by global context in the image. It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image. This makes it extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN.

How does the YOLO works?

(Source: https://machinethink.net/blog/object-detection-with-yolo/)

Unlike sliding window approaches, YOLO performs object detection on full image using region proposal approach.

YOLO divides up the image into a grid of 13 by 13 cells:

Each of these cells is responsible for predicting 5 bounding boxes. A bounding box describes the rectangle that encloses an object.

YOLO also outputs a confidence score that tells us how certain it is that the predicted bounding box actually encloses some object. This score doesn’t say anything about what kind of object is in the box, just if the shape of the box is any good.

The predicted bounding boxes may look something like the following (the higher the confidence score, the fatter the box is drawn):

For each bounding box, the cell also predicts a class. This works just like a classifier: it gives a probability distribution over all the possible classes. The confidence score for the bounding box and the class prediction are combined into one final score that tells us the probability that this bounding box contains a specific type of object. For example, the big fat yellow box on the left is 85% sure it contains the object “dog”:

Since there are 13×13 = 169 grid cells and each cell predicts 5 bounding boxes, we end up with 845 bounding boxes in total. It turns out that most of these boxes will have very low confidence scores, so we only keep the boxes whose final score is 30% or more (you can change this threshold depending on how accurate you want the detector to be).

The final prediction is then:

From the 845 total bounding boxes we only kept these three because they gave the best results. But note that even though there were 845 separate predictions, they were all made at the same time — the neural network just ran once. And that’s why YOLO is so powerful and fast.

How to annotate Bounding Boxes?

(Source: https://www.analyticsvidhya.com/blog/2018/12/practical-guide-object-detection-yolo-framewor-python/)

Now. lets see how to prepare dataset for YOLO training.

  • YOLO first takes an input image:
  • The framework then divides the input image into grids (say a 3 X 3 grid):
  • Image classification and localization are applied on each grid. YOLO then predicts the bounding boxes and their corresponding class probabilities for objects (if any are found, of course).

As we mentioned before, YOLO takes input for training in a specific format only. For instance, check the following image :

In this image, let’s say we need to annotate a car (class id-1), then the annotation would be done as:

<class id> <Xo/X> <Yo/Y> <W/X> <H/Y>

where,

class id = label index of the class to be annotated

Xo = X coordinate of the bounding box’s center

Yo = Y coordinate of the bounding box’s center

W = Width of the bounding box

H = Height of the bounding box

X = Width of the image

Y = Height of the image

For multiple objects in the same image, this annotation is saved line-by-line for each object.

How our toolbox — YAT works?

YAT is an open-source toolbox for performing above mentioned annotation on the video data frame-by-frame.

Pre-requisites: OpenCV 3.0+, g++

  1. Download the tool from this repository https://github.com/2vin/yolo_annotation_tool
  2. Edit the ‘labels.txt’ and put your desired list of classes in this file.
  3. Compile the program using the instructions in the file ‘compile.sh
  4. Once compiled, reset the counter in ‘/data/index.txt’ to ‘0’ to start with.
  5. Run the executable as mentioned in the file ‘compile.sh
  6. The annotation window will open-
    Press ‘d’ key to go to next frame.
    Press ‘w’ key to jump next 30 frames.
    Press ‘x’ key to reset annotations in the frame.
    Press ‘s’ key to save the annotations in the output folder.
    Press ‘Esc’ key to quit the program.
    Use the Trackbar to toggle between different classes & verify class label on the top-left corner.
  7. Repeat the process for new video files, and keep track of the counter in ‘/data/index.txt

8. The output files will be saved in the ‘/data’ folder automatically.

Screenshot of YAT

This is how YAT looks like-

Screenshot of the toolbox

Here, the yellow marker keeps track of current mouse position so that the objects can be marked properly. Once all the objects are marked in an image, the user just needs to press ‘s’ to save all the annotations in the output folder.

Code available at:

Train your own YOLO models using YAT and let me know your feedback in the comments. For more detail, contact me at www.connect.vin

--

--