Handwaving through face pose estimation

Gonçalo Palaio
3 min readJun 12, 2018

--

The current,totally broken, software rendered visualizer.

The interest in the theme started when I ran into this particular Kaggle challenge: https://www.kaggle.com/c/facial-keypoints-detection

I started exploring the dataset, building a few generic models using Sklearn and Tensorflow.

Since I was exploring Rust at the time, I also built a really simple visualizer with rust_minifb just to view the landmark in the respective picture.

Unfortunately, it turns out I lost interest in the challenge. The main objective was not that really interesting for me, at least for a side project.

While doing all this I ran into this dlib blog post about face aligment from 2014.

I’ve stumbled upon dlib a few times. It’s a portable toolkit for machine learning built in C++ with a really permissive license.

Then I had this idea of recreating those results in a mobile phone.

There’s a few forks of the library to make it work within an Android project without any fuss, for example https://github.com/tzutalin/dlib-android.

I looked at a few dlib demos running on Android and found that most of them perform face alignment almost “realtime”.

Still, knowing that, I started by looking into how dlib implements the technique described in the blog post mentioned earlier.

One of the ways dlib implements face alignment, is using the technique described in “One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine Sullivan”.

Dlib code is really well documented and it’s easy to browse through its website. The part where the actual regression tree is built is easy to find, although it relies somewhat heavily in the C++ templating system in a few places, it might not be the most readable code for people not comfortable with that feature (for example, me).

There’s a few alternative implementations out there that might be clearer. Here are few links:

Of course there are other techniques, but I decided to stick to this one for now. I never messed around with regression trees and this seems a good challenge. They seem to work fairly well in this case and be fast enough.

I’ll spare you the info dump of all the stuff I found while browsing.

At this point, I’m still messing around with the dlib examples and getting a sense how this particular technique works.

I’m reading about regression trees and hopefully in the follow up to this post I will have my own implementation.

Oh and of course, I got sidetracked and wrote another visualizer in C for this particular case.

From the measurements I made, what really takes a lot of time is not the face alignment part, but the face detection. More details in the future.

For now that’s all I got. Looking forward for more advancements.

--

--