How to Curate Data for Computer Vision Models

Published in

AI Practice and Data Engineering Practice, GovTech

10 min readJul 5, 2021

This guide introduces the steps to curate data for a new computer vision model. Suitable for those who are new to artificial intelligence.

Data curation is important to building useful AI models

Here at GOVTECH’s Data Science and Artificial Intelligence Division, we collaborate with Singapore government agencies to build Artificial Intelligence projects. Often, we work with public officers who see value in Artificial Intelligence adoption but are new to the technology. We learnt that for those new to Artificial Intelligence, starting a project can be difficult and questions usually centre on data. In this article, we will talk about the initial steps in building a Computer Vision model, focusing on data curation.

A Computer Vision project starts with the intent (what you want to gain) and the data (what images or videos you have). Data curation is a critical part of model development as Computer Vision models are derived by learning from the data they see. We define data curation as the process of selecting, preparing and organising a collection of data such that the value of the data can be maintained over time. Any experienced AI Engineer will tell you that it is common to spend 80% of computer vision model development time in data acquisition, curation and annotation.

The focus of data curation is on having good data and not just getting more data.

Good data is:

Consistent — leads to the expected outcome you desire
Relevant — focused on the important aspects of your project

Proper data curation will ensure good data is used for model training, which in turn will help optimise the time required for model training and development.

People who are familiar with Computer Vision appear to be able to intuitively determine the nature of data to use and the volume of data required. The truth is that for any given problem, there is no fixed formula for data curation. Data curation requires experimentation and constant refinement to achieve better model training results. Fortunately, it is a skill that can be honed over time with a bit of willingness to try and data availability. Here are some tips from us to get started.

Determine the suitable model type

In this article, we assume that the intent or problem has been defined and that it can be addressed using Computer Vision techniques. The first step will be to determine the suitable model type for the problem on hand.

What is the task that you want to automate? Consider the benefit that you wish to yield from using computer vision.

Here, we will walk through the data curation process using two common computer vision problem types — Object Detection and Image Classification.

Object Detection

Object Detection models are typically used to count, locate, track or identify objects in the same scene. They are to address the question “Where is the [object of interest] in the scene?” and are suitable if there are multiple objects of interests to be identified from an image.

Image of artemia
(Image courtesy of Singapore Food Agency for illustration purposes)

One example is the identification and counting of rotifers or artemia as shown above. For this image, we are interested in the total count of artemia and shells. Intuitively, you can see that manual counting is tedious and time-consuming. Over an extended period of time, manual counting can be prone to errors and become inconsistent. Object Detection models can be used to automate this manual process at scale with consistent count accuracies.

Image Classification

Image Classification models are used to sort or identify images. They are to address the question “Is there an [object of interest] in the scene?”.

For example, we use image classification to differentiate images with different manhole cover types without the need to locate exactly where the manhole cover is within the image as above. Or, it can be used to identify if the image is a picture of a bench or city.

Does any of the problem types fit yours?

Define an Image Annotation Guide

After determining the model type, it is useful to establish a consistent image annotation process. It is ideal to have an annotation guide to:

Set guidelines to standardise annotation across all data
Lay out the requirements, focusing on the desired outcome(s) from the Computer Vision model

Whether you are an individual or part of a team, the guide will help to build consistent and relevant data to train and test a usable Computer Vision model.

What should be included in an annotation guide?

Included below is a sample of what can be included in the guide. The guide should contain these elements:

Number of scene types
File naming convention
Guidelines on how annotation should be done

Number of Scene Types

Look through your data and group your images into distinctive scenes.

Distinctive scenes usually arise when there are different camera sources. You can consider different scenes as images of different backgrounds. For example, scenes capturing roads, buildings or grass patches. Each scene can also vary in environments (lighting condition, weather) or viewpoints (pose, field of views).

Total the amount of data for each scene and aim for parity in the amount of data per scene. This is to ensure that the data is distributed fairly.

Does your data have the same scenes as the ones for operation or production?

Prioritise data that are the same or similar to the scene used for operation or production. For example, if the problem is to identify the type of manhole covers on roads, prioritise having more data with scenes of manhole covers on roads rather than on grass patches.

Is it necessary to obtain more than one scene?

If the scene is fixed, for example, viewpoint is fixed and the environment is always constant, you can use the data generated from the same source for training.
If the scene is not fixed, you will need to identify the most relevant scenes. Variety in scenes, environments and viewpoints can help to generalise the model.

File naming conventions

Name the images with an intuitive convention. For example, an image name can have a convention location-date-img-number (“ubin-210101-img-0001.jpg”) or class-scene-number (“sewage-road-0001.png”). This will help to:

make it easy to search for an image or a batch of images, and identify which batch an image belongs to;
make it easy to identify duplication;
make it easy to add or remove a batch of images to or from a project; and
make it easy to ensure a fair distribution of scenes and classes.

Guidelines on how annotation should be done

Sample images should be included as they help the team visualise how the classes should be labelled.

I have divided this section into Annotation Guidelines for Object Detection models followed by Annotation Guidelines for Image Classification models for ease of reference.

Annotation Guidelines for Object Detection

Before you start annotating, you need to define object classes. When the images are being annotated, the objects of interest are labelled according to the classes defined.

It is advisable to use a descriptive name for each class for ease of identification. Rather than naming your classes as A, B etc…, use names such as artemia or SewageCover.

As most models are case sensitive, it is important to determine the class naming convention to be either all small, all big or mixed caps at the start to avoid any unexpected errors.

To annotate images for Object Detection models, we draw a bounding box over each object of interest. We then give that part of the image a pre-determined class name. In the following diagram, we annotated two classes of objects, naming each object either a ‘Shell’ or an ‘Artemia’.

Object Detection classes of Artemia and Shell
(Image courtesy of Singapore Food Agency for illustration purposes)

When drawing the bounding boxes, we recommend the following guidelines:

Each bounding box should cover the entire area of the object of interest where possible and should be as concise as possible.

Defining concise bounding boxes
(Image courtesy of Singapore Food Agency for illustration purposes)

All objects of interest with significant and recognisable portion exposed within a scene must be annotated.

Determining the minimum size
(Image courtesy of NParks for illustration purposes)

Defining how a boar looks like
(Image courtesy of NParks for illustration purposes)

Should I annotate a busy scene as a cluster or annotate the objects individually?

Sometimes, it is time-consuming to annotate a busy scene of many objects. It is therefore tempting to draw a box over a cluster as one object. This is acceptable if the intent is to detect if the scene has the object or not. But if the intent is to count individual objects, then you should not annotate a cluster as one object — that is, you have to painstakingly annotate every one of them.

Annotation Guidelines for Image Classification

For Image Classification models, there is no need to draw bounding boxes on the training images. You only need to label the entire image as its corresponding class. For some applications like VideoIO, the process of annotation is as simple as tagging images with the same class.

We also recommend including these guidelines:

The image should be a clear representation of the class, such that a significant portion of the image should be covered by the object of interest. A recognisable portion of the object should also be captured within the image.

Image Classification classes of Sewage and Telecom

Establish a review system for annotation

It is important to have a review process, especially if more than one person is annotating. A review process minimises annotation errors and helps the team to systematically produce good annotated data. The following flow can be considered:

Starting with one class, class A, one person annotates all objects of this class.
The second person reviews that the objects of class A have been annotated according to the annotation guide.
Differences are discussed and logged.
Repeat the above steps for the next class till all classes have been annotated.

Build dataset, train and review progressively

Now with a structure in place, the next question to answer is much data is required?

Typically, the volume of training data required to train a good model is in the tens of thousands or more. But with techniques like transfer learning and data augmentation, the quantity of training data required can be greatly reduced. That said, it is still necessary to build your own dataset progressively and incrementally so that you can observe the increase in model performance with each addition.

Example of building a model by adding data progressively and observing the performance using VideoIO

You need to split the data into non-overlapping training, validation and test sets. It is important to note that each dataset is to be used solely for one purpose, either training, validation or testing. Training data must not be used for validation or testing. Likewise, test data must not be used for training or validation. We suggest dividing your data into 70% (training set)-15% (validation set)-15% (test set) for each class. Other ratios such as 60%-20%-20% or 70%-20%-10% work as well.

They are for the following purposes:

Training

This dataset is used to train (also known as fit) the model. The model will see and learn from this dataset during the training process.

Validation

This dataset is used to provide an unbiased evaluation of a model fit on the training dataset. It is used to ‘validate’ the model accuracies as the training progresses. Validation dataset is usually used to fine-tune the model hyperparameters during training.

Test

This dataset is used to provide an unbiased evaluation of a final model fit on the training dataset. It is not for fine-tuning of model hyperparameters but to understand how well the trained model performs. A robust set of test data can help identify where the model performs well and where it does not.

How to build a dataset

Aim to start with 100 samples if data augmentation is applied. If data augmentation is not applied, the target amount to start should be at least 300. Always begin with high quality and highly recognisable samples.
Add a new batch of 100 samples at a time and train. It could be images from a different scene or of lower resolution. This will help improve the robustness of the trained model.
Review the model performance using the test set.
Observe the change in model performance. Compare the prediction results and note where it improves, and where it degrades. Use this assessment to guide data collection and repeat from Step 2. Note that it is necessary to increase the test set incrementally.

It is important to note that large quantity of similar image samples do not help the model training process and may in fact cause the trained model to be biased (towards those image samples).

Concluding Remarks

Training a good Computer Vision model takes time. Merely increasing samples does not always directly lead to a better trained model. With careful data curation, one will be able to reap the benefits of saving time and effort while maximising model performance. The steps outlined above are by no means a magic bullet — you still require experimentation and learning. However, this guide will help you in your model training journey in a calibrated way.

If you are a Singapore public officer and have a computer vision or video analytics problem, contact us. Data Science and Artificial Intelligence division has developed VideoIO, an application that supports the prototyping and development of a new computer vision model to support policy making, work optimisation and service delivery. We will be happy to assist you with your use cases. Hope to hear from you!