From image annotation dataset to a format that a deep learning model can understand

Edgar
4 min readMar 13, 2024

--

Two of the main tasks to do image segmentation are getting the dataset in the right way for a machine learning model to solve the task and to build the right model. Most of the attention goes to the second task: building and training the model. However, being able to access good quality data and to know how to organize it is crucial. In this post review the second task with examples

Image annotations and polygons

Most of the state of the of the art image segmentation models use a supervised learning approach. This means that the required dataset has to have examples of the “segmented” objects or regions that we want the model to “learn” to segment (Figure 1).

Figure 1. Image segmentation task. In order to extract the object “tree” from the image (left) using machine learning, we need to provide examples. The most common way is to “segment” or crop the region manually using a computer program (middle). This process consists of creating a polygon by clicking with a mouse over the contour of the tree which is denoted here with red dots. With these polygons we can create a mask to provide the computer with the example of the cropped tree (right).

To get the images that can be used to train the model, usually this requires a person to “draw” the perimeter or “boxing” the objects using computer programs. The “drawing” task consists of someone clicking with the mouse on the points along the border of the segmented object (Figure 1). This set of points form a polygon or set of polygons if the object or region are complex. The region or object delineated like that is called a mask (Figure 1 right).

Tools and formats

The computer program that is used to segment objects or regions organices the polygons in different ways. Some of the most common labeling tools are LabelBox, LabelStudio, V7, Roboflow. Json files are a common format to organize the polygons, which are a set of points. For example, a json file could look like this:

{
“img1_metadata”: {
“filename”: “img1.jpg”,
“size”: 34500,
“imsize”: [1000,1000],
“regions”: [
{
“type”: “rect”,
“cx”: 530,
“cy”: 525,
“width”: 200,
“height”: 200,
“class_id”: 99,
},
{
“type”: “polygon”,
“points_x”: [120, 122, 343],
“points_y”: [222, 434, 367],
“class_id”: 34,
}
]
},


}

You can see that the regions to segment the image are organized as boxes that are organized as the center of the box and its height and width, polygons are formatted as a list of the point coordinates.

What format to use to train your model?

As you can see the tool that is used to label or segment the dataset determines the format in which the images and anotations are saved. However, we also need to pay attention to the format that the model that we want to train (or use) can understand the format in which the data is stored. If the model can not understand the format, there is a need to map the format from the annotation tool into the format that the model likes.

The YOLO dataset format

One of the most popular models for object detection is YOLO (you only look once). The YOLO(v8) assumes that your data is organized like this:

class_id center_x, center_y, width, height

So for each image, each boundingbox is a line in a txt file that has the class number, the coordinates of the center of the boundingbox and the width and height (like in the line above).

I modified the this python function that I got from the GitHub repo of ultralytics to convert LabelBox data into YOLOv8 format. You might need to adapt it to your needs.

Figure 2. Format conversion from labelBox to YOLO format. Based on the Utralytics conversion tool.

Mapping your annotated dataset to the YOLO format

Once you have mapped the data from the annotation tool to a format that your model understands, it is very important to double check that the mapped format is the correct one before starting the model training. One way to do that is to display the annotations in the new format and make sure that it makes sense. For example, display an image and bounding box using the mapped format.

What about polygon based masks

The principle that we used for bounding boxes applies also for polygon based masks (segmentation masks). Depending on the program used, you want to double check the organization of the masks. Usually, the masks are a sequence of points (lists) of the coordinates of the points that denote the mask (Figure 1).

What next?

Finally, you are set to train your YOLO model or test it directly. I am preparing a separate post for that :)

References

--

--

Edgar
Edgar

Written by Edgar

PhD in Computer Science and AI. I write about AI in healthcare and Computer Science in general. The opinions expressed in my stories are my own :)