Realistic facial animations of 3D Avatars driven by Audio and Gaze

Overview of the paper “Audio- and Gaze-driven Facial Animation of Codec Avatars” by A Richard et al.

Chintan Trivedi

Published in

deepgamingai

2 min readAug 26, 2020

Facebook research has been working on the ability to create photo-realistic 3D avatars of its users, with the idea that their social network platform would one-day allow them to talk to their connections virtually using something like VR. These virtual faces have been named “Codec Avatars” by Facebook. This kind of tech has tremendous applications to the field of gaming, especially for the use-case of letting the gamers customize their playable avatar within the game to look exactly like them.

So today, I want to share a paper that proposes a method to animate these codec avatars. The paper from Facebook reality labs is titled “Audio- and Gaze-driven Facial Animation of Codec Avatars”.

The key idea is to generate facial animations including lip movement as well as expressions like happiness or sadness. The audio and gaze inputs are modeled together with a fusion architecture to obtain the end result, which looks very realistic. In order to fuse the two inputs, a mixture of Variational AutoEncoders (VAEs) is used, where the encodings obtained from these input signals are combined before producing the facial animation output.

This system is trained using 5 hours of data collected from a few participants speaking to one another while their audio and different facial expressions were being recorded.

Results

Check out the amazing results obtained by this method in the video embedded below.

Useful Links

Thank you for reading. If you liked this article, you may follow more of my work on Medium, GitHub, or subscribe to my YouTube channel.

Realistic facial animations of 3D Avatars driven by Audio and Gaze

Overview of the paper “Audio- and Gaze-driven Facial Animation of Codec Avatars” by A Richard et al.

Results

Useful Links

Written by Chintan Trivedi