Deep Learning Using Synthetic Data in Computer Vision

Published in

Cognite

4 min readSep 8, 2020

Deep learning has achieved great success in computer vision since AlexNet was proposed in 2012. This success is mainly related to two factors: a well-designed deep learning model, and a large-scale annotated data set to train the model.

Nowadays, deep learning has become a go-to method on computer vision projects. Solving a supervised learning problem in computer vision such as classification, detection, and segmentation commonly takes two steps:

choosing and downloading a pretrained model which is suitable for the problem
retraining the model using customized annotated data by applying transfer learning

Many pretrained models are available to download from the internet. The second step — retraining the model using the customized annotated dataset — is therefore the main issue.

Annotating images is a time-consuming task. People normally start from a small dataset and then apply image augmentation to increase the size of the dataset. Image augmentation has been widely used in deep learning of computer vision. It uses traditional image processing, such as blurring, adding noise, and changing color channels to generate new images from an existing image. Shorten and Khoshgoftaar gave a good overview on image augmentation in their paper. However, to apply image augmentation, we need to have existing annotated images, otherwise it is not helpful.

This post will introduce another technique which generates annotated images without any existing annotated images. It is image synthesis — more specifically, green screen.

Green screen is not a new technology. It has been widely applied in film production (as well as in news and weather reports) for many years. Green screen is a visual effects technique where two images or video streams are composited together. It basically drops an object into whatever background images you want behind the object. Figure 1 shows an example of movie scenes before and after green screen effects have been applied.

**Figure 1:** A movie scene before and after green screen effects (https://digitalsynopsis.com/design/movies-before-after-green-screen-cgi/).

We can borrow the same idea from green screen to image annotation. First we need to define an object to be detected and then prepare the object image with a transparent background. Next, we can paste the object image on some background image. Figures 2–4 show an example of generating an image with a windmill for windmill detection.

**Figure 2:** A windmill with a transparent background (Note: It would be better to have the object image from different viewpoints).

**Figure 4:** A fake image that merges Figure 2 and Figure 3.

Before we merge the object image and the background image, we can predefine the size of the object and its location in the background. In other words, we are able to annotate the image automatically. Here is the code:

import random
from PIL import Image

def create_annotated_image(background_file, object_file):
   bg_image = Image.open(background_file)
   obj_image = Image.open(object_file)
   #You need to implement the random_resize function to resize
   #the object according to the real size in your application
   #scenario.
   obj_image = random_resize(obj_image)
   width, height = bg_image.size
   obj_w, obj_h = obj_image.size
   x1 = random.randint(0, width - obj_w)
   y1 = random.randint(0, height - obj_h)
   x2 = x1 + obj_w
   y2 = y1 + obj_h
   bbox = [x1, y1, x2, y2]
   bg_image.paste(obj_image, (x1, y1), obj_image)
   #You could also apply image augmentation here before return
   return bg_image, bbox

Choosing the background images depends on the real environment of the detected objects. You can use images that look similar to the environment. If you have no idea about the real environment, then feel free to use a public image dataset as the background, such as SUN, COCO, or ImageNet.

The basic idea behind image synthetic data is simple but practical. The advantage is that we are able to generate infinite annotated data for training models. We can also integrate synthetic data with data generator for training on the fly, then the model is trained using the data that it has never seen before. This will enable us to avoid overtraining problems as well.

Deep Learning Using Synthetic Data in Computer Vision

Written by Min Shi