One-Click Text To 3D Face Generation With AI

Combining the papers “Text-to-Face Generation via Attribute Disentanglement” by Wang et. al. and “Realistically Renderable 3D Facial Reconstruction in-the-wild” by Lattas et al.

Published in

deepgamingai

4 min readJun 27, 2020

Wouldn’t it be awesome if we had a software tool where we could write a short description of an imaginary person, click a button, and generate a 3D face object that matches our text description? This would make it so much easier to design fictional characters in games with very little effort to do so while still having some control over the design process.

I realized this would be possible when I came across the following two CVPR papers and I thought combining the two would make it possible to create such a tool that converts text to 3D faces.

Text-to-Face Generation

The first paper, titled “Text-To-Face Generation via Attribute Disentanglement” introduces their TTF-HD framework for producing high resolution 1024x1024 images from a text description containing the facial attributes of an imaginary person.

They show that their method can produce diverse faces even from single-sentence inputs by filling in the missing facial features by itself. But if you want more control over the output image, you can specify more details in your input description.

As you can see, their method works really well for both cases, producing some highly believable faces of fake people.

Fine-Tuning the Generated Face

A possible extension of this work would be to allow latent space exploration of their generative model where we can adjust certain facial features with easy-to-use sliders and get a highly accurate depiction of your imagination.

Image-to-3D Face Generation

The second paper is titled “Realistically Renderable 3D Facial Reconstruction in the wild” in which they present a method to recover shape, texture and depth information from a single portrait image and render a 3D face from it.

Text-to-3D Face Generation

Now, by combining all these methods into a single framework, we could easily build a tool that first takes a text description input and produces a random face fitting that description. Then, we can fine-tune this generated image to match our exact imagination with a few attribute sliders and then construct a 3D face from it. This would make the entire process highly automated and extremely simple and fun to experiment with.

It seems the code and pre-trained models for both these papers are not released publicly as of June 2020, but once they do, I’ll try putting them together into a single framework.

Useful Links

Text-to-Face Paper Full-Text (PDF)
Image-to-3D Face Paper Full-Text (PDF)
Image-to-3D Face Authors’ Video

Thank you for reading. If you liked this article, you may follow more of my work on Medium, GitHub, or subscribe to my YouTube channel.