From Scan to Avatar, the Unsuspected Hurdles

Using the Shadow app, you can create a lifelike 3D counterpart and make it the star of GIFs or movie-like animated scenes. The raw material used to create your avatar is the 3D model of your head, which you can finally obtain by using a mainstream device.

A major challenge is to take this model and to turn it into a full body character that is 1. Natural-looking and 2. Animatable.

In this post, I want to give details on how we achieved Step 1.

Right now, you are probably thinking something along the lines of, “Wait, isn’t it just about sticking a head on a body?” Well, not exactly mon ami.

I could write pages and pages on the computer vision techniques or the geometry processing involved. However, my goal here is to make you understand, with a few visual examples, why it is much trickier than it seems.

Understanding The Raw Material

First things first: Consider a 3D scan as an empty shell, with a set of points that define the volume, or mesh, which is covered by an image called the texture. See below:

Left: 3D model with the texture. Right: the mesh-only model.

We blend this bust with a preset full-body character, like this one:

In the 3D world, genitals do not exist.

Oh, by the way, identifying the correct skin tone for the body seems obvious on the scan above. How about for this one?

Which one should we use? If any.

The lighting conditions significantly impact texture colors, consistency and quality. This is why, in this type of case, picking the right skin tone is difficult.

Let’s Play Dr. Frankenstein

Here comes the fun part. After a — very French — decapitation of our preset character, we aim to fuse our scan to it.

As you can see above, the head is not exactly straight. Surprise! It turns out that when doing a scan, the subject rarely has a perfect head/bust alignment (hint: it’s never the case). Here we have to: 
(A) Straighten the head.
(B) Identify the correct zone of the neck to deform.
(C) Perform a smooth merge with the bust.

Hurrah, job done! For this unique, specific case, maybe. Except, since your head and neck can move in all three dimensions, the number of different combinations to deal with is plentiful. See this next example below:

FYI, this is me, undoubtedly the ideal model :)
So, my head is tilted upwards and positioned forward.Here the solution consists in:
(A) Pushing the head backward.
(B) Adjusting the neck.
(C) And finally, tweaking the head rotation to make the gaze line horizontal.

Phew! If you have not been paying attention, here is a quick reminder: all this must be done automatically, and for countless different cases and head shapes.

Also note that, for the sake of simplicity, we have not discussed the size of the head relative to the body. If you believe that there is some magical “golden” ratio that combines face features and bust size to find the adequate dimensions, you are barking up the wrong tree.

Semantic Segmentation To The Rescue

Previously, I mentioned that one of our most important tasks is to deform the neck. That means identifying exactly where the neck is, both on the texture and on the 3D mesh. This task may feel childishly simple, but if you know even a little about computer vision, you will know that the simplest things for a human eye are difficult to reproduce by a computer.

Look at the pictures below. While a three-year-old can do this with her eyes closed, it took us decades to actually detect — with a reasonable accuracy — what are the animals in the picture, and another 10 years to properly define the contours around them.

Finally matching a toddler’s skills. Yay! (Credit: Christoph Korner https://goo.gl/Qv3S8H

Computers can now properly analyze this type of picture thanks to recent advances in deep learning. This is the reason why deep learning is the method we used to gain a better understanding of the 3D scans. The neck, of course, is just one of many labels that we must attribute to a model.

On the left, the conventional face landmark detection (more or less what is used to create the funny face filters in many apps). On the right, the semantic segmentation using neural networks.

This topic is fascinating, and we intend to give more details on it in the future. If you are intrigued in the meantime, you can watch a quick intro in this short video.

Wrapping up

With these few examples, you can understand why creating a natural looking avatar from just a head scan is a technical challenge. Believe me, I merely discussed the tip of the iceberg in this post. Besides, we have to keep in mind that our well-trained human eyes are very good at spotting unnatural shapes on humanoid characters. Therefore, everything we do — deformations, merges, etc. — must be dealt with much care.

Most of the tech challenges we address are new. Until now, there has been no mass production of 3D scans, since it was a technology that was restricted to big studios or die-hard enthusiasts. The models were all processed manually by 3D artists, and there was no compelling reason to do things differently.

With scanning finally available on mainstream devices, there will be millions of scans to take care of, and we are proud to have put together a unique, automated pipeline to do so!

I’d like to take this opportunity to tip my hat to Nicolas, our CTO, and Vincent, our Lead Scientist. I cannot count the times when a discussion on a thorny topic started with, “What? Automating this task?? Loïc you are borderline delusional!” Fortunately, it later became, “Hey, we might have an idea on how to pull it off.” And then we would finally see the rise of a functional solution a few weeks later. You guys are the best :)