Roboflow: Converting Annotations for Object Detection

Samuel Theophilus
Analytics Vidhya
Published in
4 min readSep 3, 2021

--

SOURCE: Person-of-Interest (TV Series: Season 3)

Object Detection is a growing area in the field of machine learning and has received attention in recent times. It is a computer vision technique used to locate instances of objects (such as human faces, license plate numbers, crops, etc) in images or videos. Although it has been around for quite some time, it has witnessed a rapid revolutionary change moving from traditional methods such as:

  1. Viola–Jones Object Detection Framework
  2. Scale-Invariant Feature Transform (SIFT)
  3. Histogram of Oriented Gradients (HOG) features

It has evolved to the use of Machine Learning techniques such as:

  1. Region Proposals (R-CNN, Fast R-CNN, Faster R-CNN, etc.)
  2. You Only Look Once (YOLO)
  3. Single Shot MultiBox Detector (SSD)
  4. Retina-Net

Machine learning (Deep Learning) techniques have gotten the attention of researchers and innovators due to the impressive results they have achieved in object detection. However, in order to train an ML model, we need a labeled dataset and these labeled data come as images and annotations (JSON, TXT, XML).

What are Annotations?

Image or Video annotation is the process of attaching labels (predetermined classes - human, dog, car, etc.) to an image/video frame in order to recognize, count, or track or segment objects boundaries in images/ videos as the case might be. The annotations could take any of the following forms:

  1. Bounding boxes
  2. 3D Cuboids
  3. Polygons
  4. Lines & Splines
  5. Semantic segmentation

For more information about the types of annotations, visit TELUS.

These annotations are usually saved in various standard formats and can be fed to a computer vision algorithm (deep learning) for model training. The annotation files contain object coordinates ( information about where each object is located on the given image). When working with deep learning models, it is important to get familiar with some of the popular annotation formats and learn how to convert them to other formats for flexibility to use various object detection algorithms.

Annotation Formats

Although this is not an exhaustive list, here are some of the popular formats used for training ML models:

1.YOLO

The ‘You Only Look Once’ (YOLO) algorithm was successful in recognizing objects in images/ videos in real-time. This success made the annotation format popular and as new variants of the algorithm were developed, their respective YOLO annotation formats also gained popularity.

  • YOLO Darknet
  • YOLOv5 PyTorch
  • Scaled-YOLOv4
  • YOLO Keras
  • YOLOv4 PyTorch

2. COCO: This JSON format gained popularity with theMS COCO dataset Microsoft released in 2015. It has become a common format for object detection models such as R-CNN, Fast R-CNN.

3. Pascal VOC

Although no known model works directly with the VOC XML labels, this format is still considered a popular annotation format. The annotation format was created for the Visual Object Challenge (VOC) and has become a common interchange format for object detection labels.

SOURCE: Roboflow

For more information about other annotation formats and the Roboflow universal conversion tool, visit https://roboflow.com/formats .

ROBOFLOW: An overview

Roboflow is a computer vision platform that allows users to build computer vision models faster and more accurately through the provision of better data collection, preprocessing, and model training techniques. Roboflow allows users to upload custom datasets, draw annotations, modify image orientations, resize images, modify image contrast and perform data augmentation. It can also be used to train models.

Just like I mentioned, Roboflow also has a universal annotation conversion tool that allows users to upload and convert annotations from one format to another without having to write conversion scripts for custom object detection datasets.

Getting Started

First, you need to signup at https://roboflow.com/. Please note that selecting the “Public Workspace” while signing up on a free-tier account gives you access to upload image datasets up to 10,000 source images.

Create New Project

After creating a project, the next step is to upload a dataset containing both images and existing annotation files (could be in JSON, Txt, or XML formats) or draw the annotations from scratch. For a comprehensive walkthrough on Roboflow, watch this YouTube Video.

Exporting Datasets

The final step after preparing your dataset is to export the data in the preferred format:

--

--

Samuel Theophilus
Analytics Vidhya

Machine Learning Engineer || Technical Writer || Data Engineer • Passionate about Computer Vision, NLP & Business Intelligence.