Speech-Controlled Body Animations With Deep Learning

Overview of the paper “Style-Controllable Speech-Driven Gesture Synthesis Using Normalizing Flows” by S Alexanderson et al.

Published in

deepgamingai

3 min readJul 14, 2020

AI techniques like LipGAN can generate lip movement animations on a face using just a speech audio file as input. This is great for automatically generating many talking animations in games. Now, in addition to this, if we can also synthesize realistic hand and body movement animations in co-ordination with the speech audio used as input, it is possible that we will soon be able to create entire animations of our virtual game character talking and interacting, without having to design any of it manually.

This is why in this article I want to share the paper titled “Style-Controllable Speech-Driven Gesture Synthesis Using Normalizing Flows” by researchers in Sweden. This technique can generate plausible gestures given only an input speech audio. It is also capable of generating multiple unique gestures for the same speech thanks to its probabilistic generative modeling approach.

It uses an Auto-Regressive model like an LSTM that is trained to learn motion as a time-series distribution of poses. Here, it uses previous poses as part of the input to predict the next pose in combination with some other inputs. First of these inputs include acoustic features from our input speech with a sliding window mechanism.

Next, there is an optional style parameter which we can add to this model to control the type of animations it generates. These are fed into a flow model which computes a plausible pose for the current time step, and this process is repeated for the rest of the time sequence according to the length of our input audio file.

As I mentioned before, we can also control the style of output animation, which makes this technique very practical to use in real life. Suppose we are using this model in game development, we can create different styles for different game characters and use that as an additional input to give our character a specific personality. Very cool indeed and very useful for animating different game characters! This work pushes the boundaries of automation in content generation while simultaneously requiring fewer and fewer artistic skills to generate this content. I can’t wait to see the future papers in this line of work!

Useful Links

Thank you for reading. If you liked this article, you may follow more of my work on Medium, GitHub, or subscribe to my YouTube channel.

Speech-Controlled Body Animations With Deep Learning

Overview of the paper “Style-Controllable Speech-Driven Gesture Synthesis Using Normalizing Flows” by S Alexanderson et al.

Useful Links

Written by Chintan Trivedi