LiveMash : 22 hrs @ the Facebook Global Hackathon Finals

Vignesh Vishwanathan
6 min readApr 2, 2017

8 months after winning Facebook’s award @ HackIllinois for our Sorting Hat (a la Harry Potter), my friends Yuriy, Neel, Chamila and me were pumped to compete in the Global Hackathon Finals at Facebook HQ.

Our hack placed in the Top 6, giving us a chance to present live to a panel of judges, including early Facebook engineers and founding team members from Whatsapp and Oculus VR.

With more and more people using Facebook Live to explore videos from around the world, we wanted to enhance the social experience with two main features:

  1. Allowing multiple video sources to stream to the same live video (Smartphones, GoPros, etc.)
  2. Facial recognition and smart tagging within the video

LiveMash consolidates these features into a simple end-user experience — just start streaming from your IP-connected camera. Our augmented streaming platform aims to create a Live experience that can show you more of what’s happening in the world around you.

Setup

  • IP Cameras: iPhone / GoPro
  • Netgear Router
  • Facebook API

Stream Design

As Facebook Live has a video resolution cap of 1280x720, we aimed to get 720p video coming from all the devices.

All iPhones were using the I.P Camera app to capture 720p live video from the front/rear cameras and stream it over the local network.

I.P Camera browser and app view

We leveraged the open-source goprowifihack to configure and stream 720p live video from the GoPro.

All video streams are collected through a Node.js server, and each buffer goes through scaling and bitrate/frame-rate adjustments, before being sent to post-processing.

During the post-processing step, the individuals in each video stream are recognized via Facebook’s photo tagging endpoint, and floating name tags are rendered onto the buffer using OpenCV.

The Node.js server produces the output buffer by stitching the video streams together — it is then sent as an RMTP stream to the Facebook Live endpoint.

Facial Recognition

For the face recognition in our video’s, we tried to leverage the Facebook platform but came across a lot of limitations. We learnt that the Facebook public API does not allow tagging in videos, and is limited to tagging a couple of hundred static photos at a time. It also doesn’t reply to the API call with the names of the people, but adds the tags in the meta data of the photo that was uploaded.

We did not want to give up using the Facebook API, as the Facebook platform has one of the best facial recognition systems around just due to the volume to photos and tags that go through their platform.

We got around this by uploading photos to a private album and getting the tag metadata. The photo frames that were uploaded to the album were picked via facial detection algorithms on OpenCV. This allowed us to detect all the people in the video, without having to constantly upload every frame of the video as a photo.

Photo frames that were uploaded to the private album

However, we still got blocked by Facebook for sending in too many requests within a short time period. And to counter that, we generated several new Facebook app ids at once. Then, in our script for getting the tag metadata, every time we were blocked, we just switched to a new Facebook app id. (We got blocked about 3 times, before figuring that out!)
We also tagged people in the comments of the video when they were visible in the live video. We just did this based on the person’s info returned by Facebook’s facial recognition. This is helpful because a person might not even know when they are in a live video, and we thought we should highlight that.

Dynamic Tagging

After retrieving the tags from the still frames, we placed floating nametags over each person in the video. Using OpenCV, we were able to implement a cascade classifier to identify faces, determine the boundary boxes, and have the nametags follow the movement of each individual.

This presented an interesting challenge as we could not make the assumption that someone’s face will always be facing the camera for the duration of the live video. We had to come up with a way to represent faces that have been detected already and use the information to restore the tag when the face re-appears.

Cascade filters in OpenCV are quick ways to identify faces in frames, which was a necessary optimization when we had several frames a second. The tradeoff was that there were a lot of false positives in each frame. To keep a tag on the correct face for the it’s entire presence in the video, we needed to come out with an algorithm to track the face throughout its duration.

Demo of Name Tag

The details of this algorithm actually got quite theory heavy. At the expense of loss of generality (and cool looking mathematical symbols), we can describe it as the following:

  1. We come up with a hash function which takes the all of the pixels in a bounding box (and the RGB colors each pixel maps to) and returns some size-n bit. We construct the hash function such that the hashes of two different boxes are smallest when the boxes ‘look’ closest to each other. There are multiple heuristics one can use to judge similarity between two boxes, such as dimension and color gradients. These heuristics are used to build the hash function itself.
  2. From this point onward, each frame has four types of bounding boxes that are found by the cascade classifier:
  • Boxes that have already been tagged in some previous frame. In this case, our distance function between the hash of this frame and the previous frame lets us keep tracking it as the magnitude between the two is the smallest.
  • Boxes that have not been tagged in a previous frame, and have a high confidence of being a face. The confidence itself is a numerical value determined by the cascade classifier. If the confidence is past a certain threshold, we make a request to the facebook API to tag the phase and keep tracking it in subsequent frames as pending a tag.
  • Boxes that have not been tagged in a previous frame, and are below a threshold to be classified as being ‘similar’ to a pending box in a previous frame. We asynchronously keep waiting on these boxes to return a tag via the facebook API, and add the tag when the callback is reached.
  • Boxes that have not been tagged in a previous frame, and are below the confidence bar to make the API call. We ignore these.

Since the tracking portion of the algorithm was run on every frame, if a face every left a picture, it would simply not generate a bounding box and it would not be tracked anymore. Subsequent entries of the same person to the frame would require the same process to be restarted again

The results were pretty good, and real time! From our testing we were able to show that faces that had slightly turned away from the camera momentarily had their nametags restored almost immediately when they turned to face the camera again. We cached the hash values over several frames for possible misses from the cascade classifier.

That being said, there are several optimizations one could make:

  1. Improve on the hash function with better heuristics for face similarity.
  2. Train models to learn faces that have been tagged over several frames, as a form of persistence for the person in the video. This is easier said than done.
  3. Predict the movement speed of each face with inputs from the recording device. Think something like an accelerometer coupled with some assumptions about bodily dimensions.
  4. Use a more costly facial recognition algorithm with a faster computer. This is a naive optimization, but worth noting. We ran this entirely on a MacBook Pro.

Demo

We set up a live stream from a smartphone at our demo booth, and folks who passed by were able to see our face detection & tagging work in real time.

As Facebook automatically tags only those you are friends with (and have photo tagging enabled), we were able to demo ourselves getting tagged in the video. Some friends of ours that also stopped by the booth got to see themselves tagged as well!

--

--