Synthetic Data Generation Using Omniverse
A new technique to Improve your model accuracy with minimum human intervention!
A healthy body requires a diet to meet its nutritional needs. Similarly, the dataset must be sufficient for the model to fit perfectly for a good detection or classification. A dataset usually contains images related to the case study taken from real-world events. A machine learning model can get easily confused with random test images if data is insufficient.
Let’s consider a general scenario of collecting a dataset on forklifts or creating an environment dataset to train self-driving cars, or creating a dataset to train robots. It’s also sometimes difficult to get real-world images of such use-cases if we use the augmentation technique. Also annotating forklifts takes much time and human effort. To reduce human inputs, an advanced technique called synthetic data generation becomes the savior
What is Synthetic Data?
“ Data that is created artificially using algorithms is called synthetic data. It is neither taken from real-world events nor created from images taken from real-world events.”
Omniverse is one such platform that allows us to generate synthetic data very easily by allowing users to create images from virtual simulations that leave minor pixel-level differences compared to images taken from real-world events.
Where synthetic data can be used?
Most of the applications and robots require Machine Learning models as per their use cases. Machine Learning models are required to be trained on data that is sometimes difficult or costly to generate in real-time. But a much better solution to that approach is to create the same real-time scenario in a virtual world and capture data from there. Synthetic Data is used to create a self-sufficient dataset that can perfectly fit the test data and enhance the performance and accuracy of the model.
Difference between Data Augmentation and Synthetic Data Generation.
Synthetic data generation is highly useful for developers and holds the same purpose as data augmentation, yet there’s a big difference between both techniques. Data augmentation is creating multiple images from one single image taken from a real-life event to create a sufficiently large dataset from a smaller dataset. While Synthetic data generation does not need any image to generate data, instead one can create a scene or environment related to the user’s use case and create as many images as the user wants from the frames generated by the scene.
So let’s dig deeper into how synthetic data is generated from Omniverse.
Synthetic Data Generation Using Omniverse
One of the applications in the toolkit of Omniverse Platform is Isaac SIM which is majorly used for synthetic data generation and creating physically accurate simulations.
Isaac SIM will make it easy for the developers to Optimise AI applications. It also provides some of the 3D objects of robots for training them to use in a specific application along with some inbuilt environment and scene to generate randomized data.
So let’s see the Structure of Isaac SIM
Several properties like color, lighting, texture, material, transformation, etc. of the objects can be changed for generating data synthetically and creating different images. To create a diversified dataset, some components like transformation, color, movement, rotation, and scale can randomize objects, and components like visibility, mesh, material, and light can randomize the scene or environment.
For example, there is a trailer yard. The environment can be randomized by changing the way objects are placed in the rack and adding random obstacles in the environment like a person or box. Here is the view of the trailer yard created in Omniverse which seems similar to the real-world image.
Let’s check out specific settings to make changes in the scene or environment and object randomly.
→ You can add objects whose color you want to randomize in the primPaths.
→ Similarly, if you want to randomize the movement or rotation, or scale of the object, the settings for the Randomized Controller can be changed accordingly.
If all the components for Domain Randomization are added together, and if various target objects are added in the primPaths, it would look somewhat like the below-given gif.
Synthetic Data Generated
Each new frame created with a difference in the environment or the object gives
RGB Format Image
Depth Image
Segmented Image
Instance Segmented Image
2D and 3D Bounding Box
For a single frame of the scene, if multiple viewports are created with different angles of the scene, images at different angles with other differences can be generated.
Below given image demonstrate the viewport settings.
Can you imagine kids playing with robots and robotics applications? It is going to be a reality soon as Omniverse has made synthetic data generation faster and simpler for training machine learning models used in robots, which will make robots cheaper and easily accessible for a normal user. Truly Omniverse is setting high standards for other platforms availing use cases like synthetic data generation which is speeding up the process like data collection and preprocessing.
Nvidia Articles and videos Related to Synthetic Data Generation
Issac sim
what-is-synthetic-data
NVIDIA Omniverse Replicator Generates Synthetic Training Data for Robots
Credits:
A whole-heartedly thanks to our team and especially Shailja for contributing to this article.