Converting Polygon bounded boxes in the Dataturks JSON format to mask images

This blog is about converting the Dataturks JSON format for polygon bounded box annotations to png encoded binary mask images, which are often necessary as input to create training data for teaching image segmentation models such as mask RCNN, etc. The code provided also scrapes the original images from the link provided in the file as the content value.

A typical training sample in the JSON would look like the following :

{“content”: “http://com.dataturks.a96-i23.open.s3.amazonaws.com/2c9fafb06477f4cb01647e811720002e/95ee5456-88ca-4a6e-8cae-42a344875338___g03uQe4g_400x400.jpg","annotation":[{"label":"Face","points":[[0.325,0.005],[0.845,0.0025],[0.8775,0.0775],[0.895,0.13],[0.89,0.1825],[0.8625,0.26],[0.84,0.3475],[0.8375,0.3925],[0.8375,0.4375],[0.7975,0.505],[0.7475,0.5525],[0.665,0.61],[0.58,0.67],[0.5025,0.6775],[0.4575,0.665],[0.4275,0.6225],[0.4025,0.545],[0.3825,0.51],[0.3225,0.3725],[0.3125,0.3075],[0.3075,0.155],[0.3125,0],[0.325,0.005]],"imageWidth":400,"imageHeight":400}],"extras":null}

In the given format, the content refers to the link containing the original image present in the dataset annotated. The annotations consist of the label along with the points marking the bounding polygon around the region of interest, apart from the dimenions of the images.

Here’s a code to convert any given Dataturks polygon bounding box annotation file to a dataset containing the groundtruth images along with the mask images :

A couple of dependencies:

pip install scikit-image
pip install numpy

The usage of the above code for the necessary conversion is as follows :

python Dataturks_to_mask_images.py <path to Dataturks JSON format> <path to folder to store the downloaded groundtruth images> <path to folder to store mask images>

If the annotation contains multiple classes or objects for a particular image, the mask is created such that each mask image contains only one object of one class, a format suited for conversion to the MS-COCO Data format easily.

A sample mask created is given below :

Ground Truth
Binary Mask Images

If you have any queries or suggestions, I would love to hear about it. Please write to me at abhishek.narayanan@dataturks.com.

Shameless plugin: We are a data annotation platform to make it super easy for you to build ML datasets. Just upload data, invite your team and build datasets super quick. Check us out!