Getting data annotation format right for object detection tasks

Published in

Analytics Vidhya

3 min readApr 11, 2020

Annotating a cricket ball in an image for object detection. Picture taken from https://www.sikana.tv/en/sport/cricket#chapter-3_

If you are looking for ways to annotate your images and pass them to your object detection algorithm then here is a short read about how to get it right.
I will be using a YoloV3 annotation format as an example to illustrate the process. I won't get in to the details of how Yolo works but you can read about it here

If you want to train custom data on YoloV3 to detect objects in an image, you will need to annotate(label or draw bounding boxes around objects of interest) your custom data(images) first and then pass on these annotations to the model for training.

The format of the annotation is

<class> <x> <y> <width> <height> 
ex: 0 0.25 0.44 0.5 0.8
class is the object class, (x,y) are centre coordinates of the bounding box. width, height represent width and height of the bounding box

There are plenty of tools that let you annotate images and export to the right format. I have used RectLabel, LabelMe, and my personal choice is Cloud Annotations (because of how you can link it with IBM cloud for storage and allow your team to collaborate on the task). Here is a post i wrote on how to use Cloud Annotations to annotate your images.
I also found this guide useful in selecting a tool from the myriad of options available.

Most of these tools allow you to annotate an image and export directly in the YoloV3 format mentioned above, however some tools export an XML file in the following format and you have to convert it to the acceptable format.

XML format that you need to parse to convert to Yolo format

I found the following piece of Python code useful in parsing the XML file and converting to the YoloV3 annotation format. Python has a cool module called xml.etree.ElementTree to parse and create XML content. Our objective is to read width, height, xmin, xmax, ymin and ymax values and later convert them to normalised x, y, width and height.

import xml.etree.ElementTree as ET
from pathlib import Pathpath = 'your/path/to/the/xmlFiles
def parsethefile(listOfFiles):
    for myFile in listOfFiles.iterdir():
        filePath = myFile
        parser = ET.XMLParser(encoding="utf-8")
        targetTree = ET.parse(filePath, parser=parser)
        rootTag = targetTree.getroot()
        width = int(rootTag.getchildren()[2][0].text)
        height = int(rootTag.getchildren()[2][1].text)
        xmin = int(rootTag.getchildren()[3][1][0].text)
        xmax = int(rootTag.getchildren()[3][1][1].text)
        ymin = int(rootTag.getchildren()[3][1][2].text)
        ymax = int(rootTag.getchildren()[3][1][3].text)
        category = 0 
# Replace this with the class label of your objectparsethefile(Path(path))
convertLabels(xmin, ymin, xmax, ymax, height, width, category)

Converting to normalised values
Generally it is a good idea to normalise values so that it is easier on the network and will run faster

def convertLabels(x1, y1, x2, y2, height, width, cat):
    def sorting(v1, v2):
        if v1 > v2:
            vmax, vmin = v1, v2
            return vmax, vmin
        else:
            vmax, vmin = v2, v1
            return vmax, vmin
    size = (height, width)
    xmax, xmin = sorting(x1, x2)
    ymax, ymin = sorting(y1, y2)
    dw = 1. / size[1]
    dh = 1. / size[0]
    x = (xmin + xmax) / 2.0
    y = (ymin + ymax) / 2.0
    w = xmax - xmin
    h = ymax - ymin
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return cat, x, y, w, hThis will return the required annotation format

I hope this helps in getting your image annotations right and train your object detection algorithms.
Leave a response and i will be happy to respond

Getting data annotation format right for object detection tasks

Written by Santosh Reddy Vuppala