“man showing photo of him” by Kyle Glenn on Unsplash

How I reverse-engineered Facebook’s face tagging


The other day after obviously watching too much of Black Mirror I was wondering how hard it would be to be able to point your mobile phone camera onto someone and identify this person by their Facebook profile.

Turned out: super-easy. As a matter of fact, Facebook already has this system in place: it’s called photo tagging. Every time you upload a photo to Facebook, it tags it with profiles it was able to recognize using their machine learning algorithms, which are shockingly accurate.

So I figured, if only there was a way for me to tap into those photo tagging APIs, I would be able to fulfill my master plan.


First thing I thought of was uploading images to Facebook, potentially to a private album or so that I could keep them unpublished like when you have photo preview before publishing a status update.

I started to look into Facebook Graph API, but it turned out that Facebook has recently removed public_actions permission that would allow to upload photos to Facebook. Great, now I have to reverse-engineer it!

Reverse-engineering photo uploading

At this point I considered the following vectors of attack: native mobile apps and mobile webapps (https://m.facebook.com and https://mbasic.facebook.com). I was leaning towards web apps, as I didn’t want to bother with SSL pinning, I also wanted to avoid dealing with Javascript, so I ended up with mbasic.facebook.com which is just static web pages.

So I pulled out Chrome Network Monitor and started to analyze HTTP requests occuring when uploading a photo.

It turned out that the whole process is just a bunch of HTML forms with hidden inputs, including the login.

Most important thing I had to keep in mind was that I had to use the same HTTP session to preserve the cookies and load pages in the natural order to make sure the cookies are properly set, as well as to scrape hidden inputs to be able to submit forms.

I used the classic Python requests library, as I usually do for those basic scraping jobs.

First, I needed my script to login using my Facebook credentials, this is essentially as POST to https://mbasic.facebook.com/login.php

If login is successful, the session would have c_user cookie set which contains Facebook user id.

After that, you need to submit a form mbasic_inline_feed_composer id, here you need to submit the following hidden inputs: fb_dtsg,privacyx, and jazoest.

feed_composer

After submitting this form you’ll get to a preview page on the screenshot above where you can upload up to 3 images.

This part was a bit tricky, since it’s a POST with multipart/form-data where the parts of MIME structure are separated by a boundary, which is basically a random string prefixed by — — WebKitFormBoundary.

After a bit of going back and forth and stumbling upon this awesome library to do multipart data encoding I was able to get to the final step of the upload process.

Now, at this point I have a photo_id which uniquely identifies a photo uploaded to Facebook, even though it hasn’t been published yet.

photo_id

Tapping into Facebook photo tagging API

It turned out that to query tagging API with mbasic.facebook.com you’d have to proceed all the way until the finish and actually submit a post with a photo which would show up in your Facebook wall. So make sure you have Share with: Only Me in your post settings:

Once the sharing scope is set it would stay the same for all consecutive posts.

After yet another form submission you’d get redirected to the photo tagging page https://mbasic.facebook.com/photos/xtag_faces. At this point Facebook already leveraged its face recognition and hopefully you’d have you photo tagged.

At this point you need to grab your published status id, so that you can remove the Facebook story containing the photos to cover your tracks.

The final result is here: https://gist.github.com/nderkach/45d37827e25d38f606c99865c6491d0f

Caveats

Since I originally intended to leverage photo tagging in a mobile app, I wanted to deploy the script and access it through an API endpoint: POST and image and get a facebook id of a user on the image.

Once I built a simple API with Flask and deployed it to Heroku, I tried to query my API and… got my Facebook account locked. Turned out Facebook doesn’t like when you randomly access the site from different IPs.

I considered my options when it comes to getting a static IP on Heroku and couldn’t find a single add-on which allow me to have a _single_ static IP. Potential business opportunity I hear?

So I went ahead and deployed my own Proxy on an AWS EC2 instance, and voila! As long as you access your Facebook account from the same IP you should be fine.

Moving forward

I was able to achieve about 7 seconds response time for my API calls: from uploading a photo to getting a Facebook id of the user on the photo. That’s including a random 1 second sleep for who know what reason (trust me, it was needed).

Moving forward I could optimize the script by not having to login and create a new HTTP session every time I want a photo tagged. This would shave off a couple of seconds.

I could also research more complicated alternatives m.facebook.com and mobile app APIs. Mobile app APIs are particularly promising as I would likely have better API stability since the changes in mobile app interfaces are less frequent compared to web pages due to the difference in deployment cycles.