AI Generates Real Faces From Sketches! DeepFaceDrawing Overview | Image-to-image translation in 2020
You can now generate high-quality face images from rough or even incomplete sketches with zero drawing skills using this new image-to-image translation technique! If your drawing skills as bad as mine you can even adjust how much the eyes, mouth, and nose will affect the final image! Let’s see if it really works and how they did it.
Recent deep image-to-image translation techniques allow fast generation of face images from freehand sketches. However, existing solutions tend to overfit to sketches, thus requiring professional consistent sketches as inputs. Shu-Yu Chen et al. just shared a paper called “DeepFaceDrawing: Deep Generation of Face Images from Sketches” to address this issue. Their key idea is to implicitly model the shape space of plausible face images and synthesize a face image in this space to approximate an input sketch. Their system allows users with little or no training in drawing to produce high-quality face images from rough or even incomplete freehand sketches! The method even faithfully respects user intentions in input strokes, which serve more like soft constraints to guide image synthesis. They essentially use input sketches as soft constraints and are thus able to produce high-quality face images even from these rough sketches.
Most of such deep learning-based solutions for sketch-to-image translation often take input sketches almost fixed and attempt to infer the missing texture or shading information between strokes. To some extent, their problems are formulated more like reconstruction problems with input sketches as hard constraints. Since they often train their networks from pairs of real images and their corresponding edge maps, due to the data-driven nature, they require test sketches with quality similar to the edge maps of real images to synthesize realistic face images. However, such sketches are difficult to make especially for users with little training in drawing.
To address this issue, the key idea is to implicitly learn a space of plausible face sketches from real face sketch images and find the closest point in this space to approximate an input sketch. This way, sketches can be used more like soft constraints to guide image synthesis. And as you can see, the results are amazing.
As illustrated, the deep learning framework takes a sketch image as input and generates a high-quality facial image. The network’s architecture consists of two sub-networks:
The first sub-network is the Component Embedding (CE) module, which is responsible for learning feature embeddings of individual face components using separate auto-encoder networks. This step turns component sketches into semantically meaningful feature vectors using an auto-encoder architecture that separately learns five feature descriptors from the face sketch data, namely, for “left-eye”, “right-eye”, “nose”, “mouth”, and “remainder”. A “remainder” image corresponding to the “remainder” component is the same as the original sketch image but with the eyes, nose, and mouth removed.
The second sub-network consists of two sub-modules: Feature Mapping (FM) and Image Synthesis (IS). Although FM looks similar to the decoding part of CE, using decoding models converting feature vectors to spatial feature maps, improves the information flow and thus provides more flexibility to fuse individual face components for higher-quality synthesis results. The feature maps of individual face components are then combined according to the face structure and finally passed to IS for face image synthesis which converts them to a realistic face image using a conditional GAN architecture, which takes the feature maps as input to a generator, with the generation guided by a discriminator. If you are not familiar with the GAN architecture, I suggest you watch the video I made introducing them.
With this complex architecture, they adopted a two-stage training strategy to train their network. In Stage 1, only the CE module is trained, by using component sketches to train five individual auto-encoders for feature embedding. The training is done in a self-supervised manner, which I covered in a previous video and linked below. In Stage 2, they fixed the parameters of the trained component encoders and train the entire network with the unknown parameters in the FM and IS modules together in an end-to-end manner.
To assist users, especially those with little training in drawing they provided shadow-guided sketching which is shown in this article. Given a current sketch, it finds the 10 most similar sketch component images. The found component images are then blended as shadows and placed at the corresponding components’ positions for sketching guidance, as you can see on the left. Initially, when the canvas is empty, the shadow is more blurry. The shadow is updated instantly for every new input stroke. The synthesized image is displayed in the window on the right. Users may choose to update the synthesized image instantly or trigger a “Convert” command. Of course, users with good drawing skills tend to trust their own drawings more than those with little training in drawing. So they provided a slider for each component type to control the blending weights between a sketched component and its refined version. Controlling the degree of interpolation between the sketch you made and the final version that is shown for either the eyes, nose, or mouth!
Both qualitative and quantitative evaluations show the superior generation ability of their system to existing and alternative solutions. Just take a moment to look at these amazing results in comparison with the alternatives! Creating realistic human face images from scratch benefits various applications including face morphing, face copy-paste, a criminal investigation, character design, educational training, and more. Due to their simplicity, conciseness, and ease of use, sketches are often used to depict desired faces which makes this new paper extremely relevant.
Of course, this was just a simple overview of the new image-to-image translation technique that allows fast generation of face images from freehand sketches. I strongly recommend to read their paper and check their video demo, both are linked below. If you enjoyed this read, check out the video and subscribe to the channel!
The paper, code, and video of the DeepFaceDrawing is available on their page: