A (simple) Tool for Creating Better Synthetic Images for ML Training
Tool 1 of 3: Dither the components inside an object in a USD file to create realistic differences from the ‘perfect’— to create a more robustly trained ML model by adding thousands of these new training objects to your training set. https://github.com/pgaston/ditherusd
Training an ML vision model is generally straightforward. Choose your model architecture, e.g., ‘YOLO’ or ‘key points’, or … and then train the model with lots (and lots) of training images.
But, getting real training data can be difficult — it’s not just the images themselves, but also the annotation — for key points — where exactly are they? for YOLO — where is the bounding box around the object(s) of interest?
Enter synthetic data. Made up images that are already annotated. Useful to get started as they’re quick and simple to create. They are useful over time as they can handle difficult-to-create physical scenes, even dangerous scenes — think of a child chasing a ball into a street.
Is this all you need? No. There is an entire area called Sim2Real that works on the challenges — Does synthetic data help? Can it replace real data? What’s the best approach? For now, I’ve personally internalized some working assumptions:
- Start with synthetic data. It’s easy to obtain in quantity.
- You’ll always tune your model at the end with real images. The more real the better.
- The more overall data the better. But I’ve never had luck with more than 90% synthetic data.
- The better the quality of the image the more useful it is in training. Cartoon-esque images just don’t cut it. Hence my use of USD and NVidia Isaac Sim.
- Lots of variations across your synthetic training images are critical. That’s the idea for this tool! The easy things — lighting, size, and positioning are, well easy. Very important, and the point of this paper is the real-world differences that exist in objects — a simple CAD-based object is too perfect and won’t help training after a certain point.
Using a platform such as the one I use, NVidia Isaac Sim, and a tool NVidia provides —Replicator/Composer (Replicator seems to be the NVidia name, composer.py is the actual code to run…) — I can create thousands and thousands of images quite easily. And, at the cutting edge of photo-realism. Creating many variations of the core object using this tool makes my resulting model much more robust.
The Tool — Example
Imagine that you’re creating the vision system for an automated forklift to find the location (pose) of a pallet — as accurately as you can.
The normal image of a (synthetic) pallet looks like this. This image comes from NVidia Isaac Sim. The data object behind this object came from a CAD tool and was exported as a USD file.
Note — USD files, short for universal scene description are a standard that comes from Pixar and is being adopted as a future standard by companies such as Apple, Adobe, and of course NVidia.
BTW, this is a “GMA” type of pallet, common in North America. The ones in the image above are known as “Euro” type pallets. There are many variants.
So let’s dither some of the key things that will occur in the real world — boards will be off-center, rotated, different sizes, and more. This is what this tool does!
Here are some examples. The color of the affected boards is accentuated so that it can be easily viewed. In practice, you may or may not change the color.
And more — you can create as many variants as you’d like…
The full synthetic creation pipeline
The full pipeline:
- This tool creates a myriad of interesting varieties of one or more objects of high interest.
- NVidia Replicator/Composer then builds many interesting scenes, where this object is most probably of primary interest. Common scene complexities include object position, other object placement (including ‘distractors’), lighting, wall and floor textures, and more.
- Follow-on processing to convert the Replicator/Composer output, say image and annotation data into input for your ML training. One example might be the COCO format.
- Train! (Review/think, repeat…)
Here is an example of the output from Replicator/Composer showing a created, synthetic image. Accompanying this from Replicator/Composer is metadata about this image, e.g., pose, bounding boxes, and more, also emerge from the tool.
dither.py — the tool
Get started:
- Clone the repository: https://github.com/pgaston/ditherusd
- Follow the installation instructions from there, namely identify what you plan to dither inside your USD file, create the YAML file that this tool uses to perform the dithering, and then integrate this into your NVidia Replicator/Composer tool.
Good luck!
BTW, this is tool 1 of 3, related to improving synthetic data using the NVidia Replicator/Composer tool, essentially filling in some needed capabilities:
- Modifying an existing USD object real world appearance — this tool.
- Adding defects in terms of laying images onto a USD object, say of cracks. Existing tools exist — I plan on incorporating one or more approaches into my tools for ease of use.
- Adding on to an existing USD object with additional objects. For example, a pallet is usually carrying boxes, which may or may not have plastic shrink wrap over it. I’d like to train my pallet detector on this — yet Replicator/composer can not keep multiple objects together.