3D humans from a photo: the data needed to train Volograms AI
A huge amount of high-quality human 3D models, and more!
This post was first shared with Volograms Newsletter subscribers, make sure you subscribe here if you want to be among the first to receive our updates.
You might wonder how Volu manages to create amazing 3D volumetric holograms using a single video captured with your smartphone. You might have seen some of our Vologram Messages and discovered they were created using a single video recorded on any digital camera. But how does this work?
Volograms AI platform is able to generate a 3D model of a person from a single photo using multiple neural networks: our pipeline includes semantic segmentation, photometric normal estimation, monocular volumetric reconstruction, back texture generation, and more! None of this would be possible without data to train these networks.
60,000+ 3D models, 1 million+ training images
Before working on our monocular 3D reconstruction tech, at Volograms, we started capturing people in 3D using our multi-camera volumetric capture technology. We captured people in our studio in Dublin, but also in other studios around the world that used our 3D reconstruction technology. This allowed to collect a huge dataset of high-quality 3D models with a wide variety of movements, different genders, body types, ethnicities and clothing items. More than 60,000 3D human models!
We render these high-quality 3D models from many different viewpoints, with different combinations of lighting and at different resolutions. Furthermore, we generate additional data to train different AI models in our pipeline: for example, we can obtain depth images to train our depth estimation algorithms; we obtain front and back image pairs to train our back texture image generator; or we can render normal maps to train a photometric normal estimator. We use more than 1 million training images!
Semantic labels
But my favourite feature is that these 3D models are semantically labelled. We categorise each vertex of the mesh within 20 different classes, so we can identify body parts, clothing items and more! More importantly, this is all done automatically. Check out this example with one of my 3D models 👇
There are plenty of additional uses for the labels, like defining different materials for each label, which are very useful for 3D creators. Furthermore these semantic labels have significantly improved the results, but we will talk about the progress improving the quality on another newsletter edition 😀.
What’s coming next?
Some of you mentioned you like these technical posts, so very soon we will tell you a bit more about how we are able to guess how humans look from the back, just by looking at them from the front. Our current system was designed in collaboration with Trinity College Dublin (you can read the paper here), but we are working on a new version based on diffusion networks, just like Stable Diffusion and other image-generation models. We are looking forward to showing you more stuff!