Image labeling

We explore platform.ai for image labeling, clustering, and eventually create a model into the tool for the full tag dataset.

Gabriel Naya

Published in

The Startup

4 min readJan 1, 2020

Background

In the context of an OCR application use case, we received a set of images of two different classes. To optimize the search region of interest (ROI), we explored some tools to quickly tag and sort the images.

Using platform.ai¹ for images labeling

Once we start working on the platform, we can create data sets (Collections) and create different labeling projects and prediction models.

The user interface for creating collections, uploading images, or creating projects is quite intuitive and straightforward and does not support a further explanation.

How to use platform.ai

After starting a project, we find the following scenario: a set of 2D classifications of the distance between images. These layers can be browsed in the lower area to find the one that best suits our objectives.

In the upper left area, we have the main button that takes us back to the projects and collections view.
Selecting a set of images with the mouse activates the icon to view them as a gallery. In this view, we can choose a subset and assign it to one of the categories we’re going to create in the upper right text box.

When we have some images correctly labeled, we can try to train a model that tag the rest of the pictures. To do this, we use the create model and train it button on the top left taskbar.

In our case, the model obtained an accuracy of 74% because the dataset with which we handle is very confusing. If we work with well-differentiated classes, the accuracy achieved is significantly higher.

After training we can download a CSV file with the URLs of each image and its predictions:

In the cases marked in red in the CSV, the model has not worked correctly. In this cases, we start from an error in our labeling when using the image gallery.

Manual labeling error, labeled as Other must be Errors

And in many cases we found that the model has managed to correctly label the images (marked in green in the CSV):

Summary

In a nutshell, we found a tool that allowed us to label and classify our data set quickly, and then apply some manual corrections to it and finish the correct labeling.
From this clustering of our dataset, we were able to create other classification models according to the tools we know best.
Let’s hope now that the OCR application will be much more efficient since we have identified the type of image. Therefore, we can quickly locate the region of interest to work on the image, and even the language since in our case study, there are messages in Spanish and others in English.

References

[1] www.platform.ai is not an open-source tool, and in fact, the work here was develope with the free version, but this seems quite limited. Besides, the state of the art of the product is far from being in a quality stage; however, to label a small data set as it is our case, it fully accomplishes its purpose.