Automate your Lip-Sync Animations With This AI (LipGAN)

Overview of the paper “Towards Automatic Face-to-Face Translation” by Prajwal K R et al.

Published in

deepgamingai

3 min readJun 25, 2020

In a previous article of this series, we’ve seen how easy it could be to build a motion capture pipeline that can transfer head and eye movements from a real-life video to a virtual character using deep learning. So today I want to share another research paper that takes this work to the next level. It is titled “Automatic Face-To-Face Translation” and this paper was published earlier this year by the researchers at IIT Kanpur and IIIT Hyderabad.

Their technique, called LipGAN allows us to alter the lip-movements of a person in a video to match a given target audio clip. The framework used for this task is a typical Generative Adversarial Network (GAN) architecture, so it contains a Generator module and a Discriminator module.

First, the input video frames and the target audio are encoded using separate encoders and are then concatenated together. This encoding is then used to reconstruct the same face but with different position of the lips in sync with the input audio features. Skip connections are used during decoding to carry over certain lower-level information useful for accurately reconstructing the facial features.

Now, in order to guide this training process and generate realistic outputs, the discriminator is used as an adversary to the generator. The discriminator loss used here is computed on the encoding space using a Contrastive Loss. This helps to monitor the generated frames by checking if they are in sync with the target audio or not. After training this framework on the LRS2 audio-visual dataset, the results obtained by this method are pretty mind-blowing.

This tech has tremendous applications to game development as it reduces the time required to create lip-sync animations, which are very difficult to create for an artist. Once this research work is improved upon, we can use it to create more realistic looking scenes with dialogues in different games, thereby improving the game-play experience for the players.

Useful Links

Thank you for reading. If you liked this article, you may follow more of my work on Medium, GitHub, or subscribe to my YouTube channel.

Automate your Lip-Sync Animations With This AI (LipGAN)

Overview of the paper “Towards Automatic Face-to-Face Translation” by Prajwal K R et al.

Useful Links

Written by Chintan Trivedi