CGI offers promising solution to lack of data for AI

Raymond Chang
Writing for the Future: AI
5 min readAug 2, 2018

By Raymond Chang

Here are over 40 toilet designs modelled by The Pixelary, an animation studio. They will be placed into full scenes with other restroom utilities to generate artificial photos as training data.

Mouse clicks and keyboard presses echoed throughout the room. Gradually, the pixels on the screen turned into the shape of a bathtub. On another monitor, textures were applied to the bathtub. Then an entire bathroom appeared.

In May this year, an animation studio, The Pixelary, created a photorealistic scene of a bathroom.

However, contrary to what you may be inclined to think, the scene is not being used for the latest high-production movie. In fact, it’s being used to generate images to train an autonomous bathroom cleaning robot being developed by the company Greppy.

Artificial intelligence today is almost entirely dependent on big data, which means feeding lots of data into these ML algorithms called neural networks in a process known as deep learning.

These neural networks drive the facial recognition softwares in social media platforms such as Facebook or Instagram. Companies with social media platforms obtain their data from users who tag other users on an hourly basis, making facial recognition one of the most accurate AI technologies currently due to the sheer amount of labelled photos.

However, companies like Greppy do not have this luxury because unfortunately, social media users don’t tag toilet seats.

Greppy needed a way to train its robot to identify toilet seats. So it’s using machine learning to feed the computer enough photos of toilets and label where in the photo that toilet is. As Greppy inputs more photos, the AI is learning how to recognize them through algorithms.

Greppy has the algorithms, but the company did not have enough labelled photos of toilet seats. So instead, it asked The Pixelary to artificially generate these images using the 3D computer graphics (CG) program, Blender.

Obtaining sufficient and quality data is an ongoing challenge for all AI companies. Greppy decided that, instead of finding humans to label real photographs, they would use the photorealistic technology of the computer graphics industry.

“So with these goals in mind, we set out to create a 3D washroom in Blender,” said The Pixelary staff on its blog. “Our virtual robotics training scene consists of 40+ real toilet models and layouts. We even included sinks and tubs to make sure that the ML algorithm doesn’t overfit the data and think everything that’s shiny and can hold water is a toilet bowl.”

The main advantage of using computer generated images (CGI) is that the images no longer need to be labelled by hand. The computer can pre-label the images since the image itself is being produced artificially, meaning the location of the desired object, in this case toilet seats, is already known by the program.

CG can generate a large amount of these pre-labeled images in only a few short days.

“With all the assets ready, we created a script that can iterate through all the combinations of these assets,” said The Pixelary staff on its blog. “We also randomly position the camera so that we cover all the possible angles at which the robot can perceive the subject. Once you combine all the variations, the number of images it can produce is staggering.”

With 40 toilets, three bins, three walls, eight lighting setups, and ten cameras, over 86,000 images can be created.

The controlled setting of generating CGI gives two key advantages that wouldn’t be possible with normal images found in image databases such as ImageNET.

The Pixelary generated perfect masks and depth data to go along with each image. These additional renders will be used by stereo cameras to aid in object recognition.

The rendered image quality can be matched with the type of camera that Greppy’s robots would use. The team lowered the color fidelity in order to emulate the robot’s low-quality camera.

The Pixelary’s blog stated that “Greppy was able to successfully integrate these images into their machine learning pipeline and is already seeing great results in producing a neural network that accurately and quickly identifies toilet seats of all shapes and sizes.”

While still unconventional, this collaboration demonstrates that using CGI as training data for deep learning algorithms is viable and could potentially alleviate some of the problems of AI’s dependence on big data.

One way this problem manifests is in adversarial examples. With incomplete data sets, AI algorithms may fail to recognize images with slight modifications but that are still obviously identifiable by humans. This deficiency would be incredibly dangerous when applied to technology such as self-driving cars. The car may fail to identify a stop-sign with minimal graffiti or damage.

One method to defend against adversarial examples is adversarial training. Lots of adversarial examples are fed into the algorithm, explicitly training the AI to not be fooled by them.

However, it can be difficult to find photos of specific examples, which is where CGI can be used. By artificially generating adversarial examples, we can train our algorithms to avoid them in the future.

With the bathroom example, perhaps the team can add photos of toilet paper dangling over the seat or other irregularities that may occur. While there are still concerns about the realism of the artificial photos, especially for more complex scenes, CGI offers a potential solution to the lack of big data and to adversarial examples for AI.

--

--