The Effect of Improving Annotation Quality on Object Detection Datasets (CVPR2022 Workshop)

Jiaxin Ma
OMRON SINIC X
Published in
4 min readJun 27, 2022

We are glad to announce that one of our papers will be presented at the 1st Workshop on Vision Datasets Understanding [link], in conjunction with the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022. Our talk is scheduled for the afternoon short oral session (12:40–13:40 Central Time on June 27).

Jiaxin Ma, Yoshitaka Ushiku, and Miori Sagara, “The Effect of Improving Annotation Quality on Object Detection Datasets: A Preliminary Study”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 4850–4859 [cvf]

This work is joint research by OMRON SINIC X and Baobab Inc (https://baobab-trees.com/en).

Model-centric vs. Data-centric?

Nowadays, when most published works in the AI research field are about improving AI models and methods (i.e., model-centric), there is another approach that fixes the model and improves the data (i.e., data-centric). Specifically, it means a) increasing the training samples by data augmentation or collection, and b) reducing the noise by fixing incorrect or ambiguous labels. Dr. Andrew Ng and his team found that this approach greatly increased their efficiency in improving machine learning results.

This discovery reminds us to pay more attention to the importance of the data (including its annotation) quality when doing machine learning research and projects. Is the data annotated correctly and consistently? Is improving its annotation quality a shortcut to get better machine learning results? Our preliminary study made exploration and verification around these questions.

A related but not identical approach to data-centric AI is to improve the quality of datasets by re-selecting the images in the dataset and improving the accuracy of the annotations. In the case of image classification, such annotations were re-performed on well-known ImageNet dataset, and the resulting ImageNetV2 dataset [link] was reported to be a more challenging dataset than the original dataset for various image classification models.

This trend of reviewing the quality of datasets is spreading to image classification datasets, but not so much on object detection datasets. On the other hand, it is well known that object detection requires a higher level of labeling, where multiple objects in an image are enclosed with bounding boxes. If we improve the annotation quality on such an object detection dataset, how do the state-of-the-art object detection models behave? To investigate this, we collaborated with Baobab, a group of annotation professionals.

Data annotation improvement and methods

In our paper, we used object detection tasks to evaluate the actual effects of improving data annotation quality. 80k images from Microsoft Common Object in Context (MS COCO) dataset and 5k images from Google Open Images dataset were reannotated by specialists of Baobab Inc. Both original datasets have incorrect or inconsistent labels (see the following example) . More details of the reannotation process can be found in our paper.

In the original Google Open Images dataset, this cute little animal is annotated as dog (obviously, it is a cat).

We evaluated the original datasets and the new (reannotated) datasets, by using five commonly used object detection models (Faster RCNN, SSD, YOLO, EfficientDet, and DETR), and compared the results.

Results

The results are two-fold. On the Open Images dataset, the effect of reannotation is quite positive. The reannotation data get higher detection precision (mAP) across all models. On the other hand, on the MS COCO dataset, the effect of reannotation is negative.

The experimental results on COCO (old/old means trained/tested with the original dataset, and new/new means trained/tested with the reannotated dataset, the same below)
The experimental results on Open Images

When we tried to analyze the reason behind the results, we found that, on the MS COCO dataset, our specialists made great efforts to annotate as many target objects as possible (even if they were small or ambiguous, in other words, difficult samples), while on the Open Images dataset, the annotation guideline was relatively conservative. It is possible that increasing the number of difficult samples led to the downgrade of the final performance.

Our preliminary results indicated that, on object detection tasks, improving data annotation quality may not always benefit the model performance due to the possibility of adding additional difficult samples to the data. This finding is noticeable when we use a data-centric approach to our machine learning projects.

What’s next

Our reannotated datasets used in this research are now publicly available [link]. It is interesting to continuously explore what can we do to improve the annotation quality and the machine learning model performance simultaneously. If you have any suggestion, please contact us at contact@sinicx.com.

Call for Interns

This project was done as a project in the Interaction Group at OMRON SINIC X. We and our other groups (Robotics Group and Perception Group) are looking for talented students for our internships.

Check below if you are interested in our internship opportunities.

--

--