Synthetic Data Generation
Synthetic Data is a powerful tool for machine learning engineers to improve computer vision models. Synthetic data is computer-generated data that mimics real data while preserving privacy, being cost-efficient, solving data scarcity, and protecting against possible biases. Computer-generated data hence accelerates technology development by creating a more accurate machine learning model without having to wait or take time to collect physical data.
Synthetic renders help improve computer vision models because you are able to test a model without having to go out and collect the data. This makes it a lot faster and more accurate. Being able to collect physical data in real time reduces biases in data collection methods, resulting in a more accurate machine learning model.
Pursuing research in synthetic data is beneficial because it is something that can be implemented in a variety of fields. These include but are not limited to medical, retail, manufacturing, robotics, and generative design. Currently, synthetic data is used to help robots learn different tasks, without necessarily programming them. For example, NVIDIA uses synthetic data to train vehicles’ perception system by generating datasets through their Drive Sim tool.
Our synthetic data initiative at Accenture aims to use synthetic data to improve the accuracy of artificial intelligence models. Specifically, our team seeks to build a pipeline from 3D objects to perfectly annotated and segmented 2D image dataset.
Firstly, BlenderProc allows us to create rendering pipelines by piecing together modules with different functions. Some functions include randomizing the virtual camera placement, and others apply textures to objects in the scene. These modules help change the final renders created by BlenderProc, which uses Blender internally to create these renders from the rendering pipeline defined by the modules.
The trained AI model is then able to detect the different items in the specific scene or image with a certain accuracy. The accuracy of this model is measured by mean average precision (mAP) scores. The higher the score in this, the more accurate the model’s ability to detect the classes its trained on is.
To be able to demonstrate how synthetic data works, we made a web app using React. We began this process by prototyping a low fidelity mock up on Figma to plan out how the user can navigate through our web app using interactive prototyping, what components do we need to achieve our end goal, and how they should be formatted.
We developed a proof of concept that allows users to change the settings of different modules on the web app. Then, we decided to use useState() as our state management so the changes from the modules are set inside of the JSON file, which contains all the modules for the specified rendering pipeline. The JSON file is used as a messenger that communicates what is happening in the frontend, i.e., the user makes changes to module, and lets the backend know what to render. Once the file is made, we can render the new image with the latest changes. These module changes can then be applied to several different scenes and configurations with different adjustable modules.
Lastly, the implemented CSS is accessible for the user while still effective. In addition to this we made it so that it is disability friendly by adding pictures for the different textures and adjusts to any zoom level. We also added subtle changes like underlining the room the user is in.
In summary, developing this web app helps physically visualize how synthetic data generation works. In a matter of seconds, a new image is created with the user’s changes, while the trained machine learning model is able to detect all the items of interest. Though we focused on furniture and different room configurations in this project, synthetic data generation holds a lot of potential and can be implemented in many other fields of study.