Making your data labeling workflow 7x faster by model-assisted and human labeling

Hasib Zunair
Decathlon Digital
Published in
5 min readSep 21, 2023

Learn how at Décathlon Canada we use machine learning models along with human expertise to streamline and significantly speed up our data labeling workflows.

Source: Image by thefredyjacob at Unsplash.

Motivation

When building custom machine learning (ML) applications, often times we need to build our own internal datasets which naturally becomes a crucial step in the entire ML lifecycle. Typically, to build an image dataset, a labeler goes through each image and either adds tags (i.e. image recognition) or draws boxes around objects (i.e object detection). This is notoriously a costly, time-consuming and also an error-prone process due to human fatigue. Previously, we discussed in this article how we used semi-supervised learning (SSL) [1] in STAR [2] to improve the performance of our existing computer vision models by leveraging unlabeled data.

In this article, we’ll share how we are using SSL to significantly speed up our data labeling workflows and build high quality image datasets.

This article is organized as follows:

  1. Basketball Sport Dataset 🏀
  2. Training a model
  3. Model Assisted + Human Labeling
  4. Results

Let’s get started!

Basketball Sport Dataset 🏀

Now that we have explained the “why”, let’s go over the “how” and “what”.

We consider a use-case (obviously sport!) where the goal is to build a model to identify the rim, basketball and person in an image of a basketball game. In computer vision, this is known as object detection, where the model predicts what the object is and where it is in the image.

We collected video footages of basketball games and extracted the frames to have a collection of images. Then, for each image we label them using labelImg tool. In other words, we draw boxes along with the label for the target classes (i.e. rim, basketball and person). For starters, we labeled around 800 images.

Training a model

Now that we have our small labeled data, we train a model using YOLOv6 architecture on this dataset. An important thing to note regarding training is that we deliberately overtrain our model (i.e overfit) for this particular task. The concept here is that annotation-specific models should exhibit minimal bias, for example, being proficient in only one specific task. This assumption is based on the idea that the images under consideration originate from the same distribution, such as consecutive frames in videos.

Below are some examples of the prediction (class labels and bounding boxes) of the model after training on the small labeled dataset.

Source: Photos by author. Model predictions for object detection in basketball game. Given an image, our model predicts where the rim, basketball and person are in the image. Raw images given by Tarmak team.

Model Assisted + Human Labeling

Now that we have a trained model for our task on a small dataset, let’s go over the next steps of how our data-annotation workflow looks like for large collection of images. It simply works in two steps:

  1. Model makes predictions
  2. Human labeler fixes incorrect labels

Basically for a large collection of images, we first make predictions using the trained YOLOv6 model. This step is vanilla SSL where the model generates pseudo labels (i.e. artificial label defined by a model).

The second and final step is that the human labeler fixes the pseudo labels, if incorrect. Below, we show some examples of incorrect predictions made by the trained model where they need to be fixed. On the left, it can be seen that the two basketballs have not been detected by the model, also non-target regions have been identified as target objects. On the right image, we can see that the arm of the player has been detected as a basketball!

Source: Photo by author. Visualizations of predictions made by the trained model. Red circle show mistakes by the model. Images are displayed in labelImg tool. Raw images given by Tarmak team.

This process is more accurate than simply using SSL to generate pseudo labels, since we manually fix incorrect labels made by the model. If these pseudo labels were directly used and not fixed, these errors would remain and we can assume that the dataset would be of low quality if these mistakes were in many images. Further, it is much faster than labeling the target objects from scratch (symbiosis between AI and humans)!

Results

To test the efficiency of this labeling process, we compare against manual labeling where a human annotator labels images for this task. The results presented in the table below clearly demonstrate that when labeling the same number of images, our model, when combined with human labeling, is seven times faster in completing the task compared to human labeling alone.

Source: Table by author. Comparison of manual and model assisted labeling. Model assisted labeling provides a 7x speed up.

Conclusion

In this article, we shared how we combine the strengths of semi-supervised learning and human expertise to streamline and improve our data labeling workflow. Specifically, we first collect a small dataset, train a model on it, make predictions on large collection of images and manually go over the predictions and fix/adjust them. We showed for a specific use-case that this model assisted and human labeling speeds up the process by seven times, in comparison to a human labeler. This is accurate, efficient, cost-effective and scalable, making it an essential tool for us while building high quality datasets and machine learning models using those datasets.

Let us know if you have any comment or suggestion about the topic of this article and don’t hesitate to share it with your network if you liked it :) If you have any idea to further improve performance, do reach out!

About the author

I am a Ph.D. candidate at Concordia University in Montreal, Canada, working on computer vision research. I am also an Applied ML Scientist at Décathlon Canada, where I help build new ML systems that transform sports images and videos into actionable intelligence. If you’re interested to learn more about me, please visit my webpage here.

A special thanks to the members of the AI team at Décathlon Canada for the comments and review, in particular Yan Gobeil and Mitul Patel.

References

[1] Yang, Xiangli, et al. “A survey on deep semi-supervised learning.” IEEE Transactions on Knowledge and Data Engineering (2022)

[2] Zunair, Hasib, et al. “STAR: noisy semi-supervised transfer learning for visual classification.” Proceedings of the 4th International Workshop on Multimedia Content Analysis in Sports. 2021.

--

--

Hasib Zunair
Decathlon Digital

Computer Vision Ph.D candidate @ Concordia University | Prev. ML @Decathlon, @Ericsson. For more, see: hasibzunair.github.io/. 🇧🇩