What is Image Annotation?

Jiayin Low
Supa Blog
Published in
4 min readJan 10, 2020

The performance of Artificial Intelligence is heavily reliant on the accuracy of its training data.

Image annotation is a key technique used to create training data for computer vision. In order for machines to perceive objects in their surroundings, annotated images are needed to train Machine Learning algorithms to learn to see the world as we do.

Annotation in Machine Learning is essentially the process of labelling data in the various mediums of images, text or video. The labels are usually predetermined by a machine learning engineer or computer vision scientist and are chosen to provide the computer vision model information on objects depicted in an image.

The algorithm would then use the annotated data to learn and recognise similar patterns when presented with fresh, new data.

Depending on the nature of the project, different industries would need different forms of annotation.

Types of image annotation

Bounding box

The most commonly used and simplest type of image annotation is the bounding box. This form of annotation requires labellers to draw a box as close as possible to the edges of key objects within the image. Usage of the 2D bounding boxes is often found in object classification, localization and detection for various industries such as retail, e-commerce and healthcare.

Bounding boxes example used in retail AI technology, to ensure the state of the shelves

Polygon annotation

Polygon annotation is important because not every object may fit precisely in a bounding box. It’s usually used for more precise annotation for items that are irregularly shaped, for example, non-symmetrical objects in aerial images such as fruits, trees, landmarks or houses. Polygon annotation usually requires a high level of precision from the labeller.

Line annotation

Line annotation as the name suggests involves the annotation of mainly lines and splines, which are used to draw boundaries in a region of an image. It is primarily used when a section that needs to be delineated is too small or thin and isn’t achievable by bounding box. Dissimilar to the bounding box, it avoids white space and additional noise. Line annotation is commonly used to label data for autonomous vehicles.

The lines are used to train vehicle perception models for lane detection.

Point annotation

Point annotation involves the accurate plotting of key points at specified location on an image. This form of annotation is most commonly used for facial recognition and sentiment analysis. By identifying and following the movement of landmark points on a facial expression, machine learning algorithms can detect emotions through predictive reading.

Point annotation is used to help machines in detecting and identifying facial expressions and emotions in sentiment analysis.

Semantic Segmentation

Semantic Segmentation is the task of separating an image into multiple sections and classifying every pixel in each segment to a corresponding class label of what it represents (i.e, pedestrian, car, lamp post). This gives machines a comprehensive understanding of every pixel of a scene in an image.

Semantic Segmentation is commonly used for detection and localisation of a specific object. Applications of such granular understanding of images can usually be found in a variety of industries, and it is especially popular in the Autonomous Vehicle industry, as self-driving cars require a deep understanding of their surroundings. While in Agritech it is used for the analysis of crop fields to detect diseases and abnormal growth.

Autonomous vehicles are able to detect the edges of nearby objects with semantic segmentation

As the computer vision industry advances year upon year, the way training data is prepared for each use case will keep evolving as well. Image annotation is one of the most crucial tasks in computer vision.

While having the right annotation tool is important, computer vision models also rely heavily on high quality annotation work as it will ultimately translate into the accuracy at which is able to identify one object from another.

Getting highly accurate training data in large volumes done by an external party requires a partner who is able to break complex instructions down into clear and concise steps.

Start a test project for free today and discover new ways of improving your labeled data quality. First $50 on us.

--

--