Roboflow: Converting Annotations for Object Detection

Published in

Analytics Vidhya

4 min readSep 3, 2021

SOURCE: Person-of-Interest (TV Series: Season 3)

Object Detection is a growing area in the field of machine learning and has received attention in recent times. It is a computer vision technique used to locate instances of objects (such as human faces, license plate numbers, crops, etc) in images or videos. Although it has been around for quite some time, it has witnessed a rapid revolutionary change moving from traditional methods such as:

Viola–Jones Object Detection Framework
Scale-Invariant Feature Transform (SIFT)
Histogram of Oriented Gradients (HOG) features

It has evolved to the use of Machine Learning techniques such as:

Region Proposals (R-CNN, Fast R-CNN, Faster R-CNN, etc.)
You Only Look Once (YOLO)
Single Shot MultiBox Detector (SSD)
Retina-Net

Machine learning (Deep Learning) techniques have gotten the attention of researchers and innovators due to the impressive results they have achieved in object detection. However, in order to train an ML model, we need a labeled dataset and these labeled data come as images and annotations (JSON, TXT, XML).

What are Annotations?

Image or Video annotation is the process of attaching labels (predetermined classes - human, dog, car, etc.) to an image/video frame in order to recognize, count, or track or segment objects boundaries in images/ videos as the case might be. The annotations could take any of the following forms:

Bounding boxes
3D Cuboids
Polygons
Lines & Splines
Semantic segmentation

For more information about the types of annotations, visit TELUS.

These annotations are usually saved in various standard formats and can be fed to a computer vision algorithm (deep learning) for model training. The annotation files contain object coordinates ( information about where each object is located on the given image). When working with deep learning models, it is important to get familiar with some of the popular annotation formats and learn how to convert them to other formats for flexibility to use various object detection algorithms.

Annotation Formats

Although this is not an exhaustive list, here are some of the popular formats used for training ML models:

1.YOLO

The ‘You Only Look Once’ (YOLO) algorithm was successful in recognizing objects in images/ videos in real-time. This success made the annotation format popular and as new variants of the algorithm were developed, their respective YOLO annotation formats also gained popularity.

YOLO Darknet
YOLOv5 PyTorch
Scaled-YOLOv4
YOLO Keras
YOLOv4 PyTorch

2. COCO: This JSON format gained popularity with theMS COCO dataset Microsoft released in 2015. It has become a common format for object detection models such as R-CNN, Fast R-CNN.

3. Pascal VOC

Although no known model works directly with the VOC XML labels, this format is still considered a popular annotation format. The annotation format was created for the Visual Object Challenge (VOC) and has become a common interchange format for object detection labels.

For more information about other annotation formats and the Roboflow universal conversion tool, visit https://roboflow.com/formats .

ROBOFLOW: An overview

Roboflow is a computer vision platform that allows users to build computer vision models faster and more accurately through the provision of better data collection, preprocessing, and model training techniques. Roboflow allows users to upload custom datasets, draw annotations, modify image orientations, resize images, modify image contrast and perform data augmentation. It can also be used to train models.

Just like I mentioned, Roboflow also has a universal annotation conversion tool that allows users to upload and convert annotations from one format to another without having to write conversion scripts for custom object detection datasets.

Getting Started

First, you need to signup at https://roboflow.com/. Please note that selecting the “Public Workspace” while signing up on a free-tier account gives you access to upload image datasets up to 10,000 source images.

Create New Project

After creating a project, the next step is to upload a dataset containing both images and existing annotation files (could be in JSON, Txt, or XML formats) or draw the annotations from scratch. For a comprehensive walkthrough on Roboflow, watch this YouTube Video.

Exporting Datasets

The final step after preparing your dataset is to export the data in the preferred format:

Conclusion

If you are looking to get your hands dirty with code, I recommend you check out this GitHub repository:

GitHub - nikhilgunti/Annotation-Converters: This Repo covers all formats of annotations for Object…

This Repo covers all formats of annotations for Object Detection and can easily convert from one form to another using…

github.com

Have question? Want to fix something in the article?

Reach me:

LinkedIn: https://www.linkedin.com/in/samuelnnitiwetheophilus/
Instagram: https://www.instagram.com/nnitiwe/
Personal Website: https://nnitiwe-dev.github.io/