Can synthetic data (alone) train a robust object detection algorithm?
This study showed that synthetic data alone can train a robust object detection algorithm as benchmarked against real world-data.
They focused on the value of synthetic data to aid computer vision algorithms in their ability to automatically detect aircraft and their attributes in satellite imagery.
After conducting extensive experiments to evaluate the real and synthetic datasets and compare performances, the study proved that synthetic data is effective alone and in combination with few real data samples. For us the most interesting takeaway is that when a small subset of real data was added to fine-tune a model trained with the synthetic dataset, they observed a significant gain in mAP, leading to a performance that is on par with the model trained on the real dataset only.
Our benchmark experiments using the dataset found that blends of 90% synthetic data to 10% real can deliver nearly equivalent performance as 100% real data for the task of aircraft identification.
Fine-tuning the model trained on synthetic data only with 10% of the observed dataset achieved roughly the same results as training on 100% of the observed dataset. This method would bypass 90% of the manual labeling and collection effort. If you think about it, it is a considerable cost cut since getting real images of planes from a satellite perspective is not easy nor cheap. But not only that, think about the time you are going to save too.
Synthetic data helps to build a prior model for aircraft detection and eases transfer learning, thus greatly reducing the need for annotated real data.
Not only gathering real-world data is definitely time-consuming, annotating that data is a huge effort, typically manual and error-prone when the amounts of data reach the levels needed to train machine learning models. On the other hand, synthetic data is cheap and easy to generate, and it is automatically annotated with different kinds of ground truth data that will fit different use cases.
Can synthetic data perform as well as real data for object detection?
This study showed that synthetic data can adequately reduce reliance on real data, which is slow, expensive, and often difficult to procure. This opens opportunities for far more rapid and prolific adoption of computer vision technologies across industries.
Something to consider as well is that often the variability of real-world data is limited to what you can directly gather. That can make your model bias since you cannot provide enough diversity of conditions and scenarios at training time. With synthetic data, you can control the data you generate and make sure you cover all your use case needs, and avoid bias in your model.
We strongly believe that these experiments combining synthetic data with small amounts of real data and their results can apply to other use cases. What do you think?
Rest assured that as we find more evidence backing this assumption we will let you know.
Source: RarePlanes: Synthetic Data Takes Flight on arXiv.org
About Anyverse™
Anyverse™ helps you continuously improve your deep learning perception models to reduce your system’s time to market applying new software 2.0 processes. Our synthetic data production platform allows us to provide high-fidelity accurate and balanced datasets. Along with a data-driven iterative process, we can help you reach the required model performance.
With Anyverse™ you can accurately simulate any camera sensor and help you decide which one will perform better with your perception system. No more complex and expensive experiments with real devices, thanks to our state-of-the-art photometric pipeline.
Need to know more?
Come visit our website anyverse.ai anytime and follow us on social media.