Bridging the Academia Gap: An Implementation of PRNet Training

Project done by Weston Ungemach, deep learning intern at BinaryVR

An example of sparse and dense feature tracking us, extracted from a fitted 3-dimensional mesh generated by PRNet. These were generated in real-time from a single frame of a video recorded on a mobile device.

Outline

Project Description

TOP: A sample input jpg to PRNet, along with its target position map. BOTTOM: From left to right, sparse feature point prediction, dense feature point prediction, and the full 3-dimensional mesh modeling the input image, with colors from the input image.
A short video rotating the 3-dimensional mesh in the previous image.

Meshes and Position Maps

LEFT/CENTER: A face being modeled by the 300W-LP mesh, shown with the full mesh and wireframe. RIGHT:Top: A close up of the flattened 300W-LP wireframe. Bottom: The full 300W-LP wireframe, flattened.
LEFT: A face modeled by the 300W-LP mesh; vertices are colored by their (x,y,z) values. This mesh “flattens out” to the color gradient in the bottom. right. RIGHT:Top: The flattened 300W-LP mesh. Middle: The flattened 300W-LP mesh with vertices colored by their (x,y,z) values. Bottom: The middle image, with interpolated color values on faces.
Some sample images from the 300W-LP dataset, described below.

The 300W-LP Dataset

A sample image from the 300W-LP dataset, along with some of it’s synthesized rotations.
An example from the 300W-LP dataset along with its modeled face, along with projections of the facial mask and of feature points onto the image, which lies in the xy-plane.

Network Architecture

A diagram of the PRNet architecture. Green rectangles each represent residual blocks of convolutions with a skip connection, and blue rectangles each represent single transpose convolutions.
The loss function for PRNet.
The weight mask for the PRNet loss function.

Training the Network

Results

Each [input, ground truth position map, prediction] triple here was taken from the validation set. Triples like these were logged to TensorBoard during training for debugging.
This selection of images comes from the validation set inside of 300W-LP. Note that the results do well across various poses and minor occlusions, unless they happen to match skin color.
Each [sparse feature point, ground truth mask, predicted mask] triple here was taken from the validation set.
A sample of real-time sparse feature tracking.
A sample of real-time dense feature tracking.

Future Directions

Final Thoughts

Acknowledgments

References

Hyprsense develops real-time facial expression tracking technology to light up the creation of live animation.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store