Building a real-time smile detection app with deeplearn.js and the web Shape Detection API — Part 1: Face Detection

Published in

The Unit

6 min readJan 25, 2018

For three weeks before Christmas at The Unit I had been experimenting with the unreleased web Shape Detection API and Google’s hardware accelerated browser based machine learning library deeplearn.js to build a (tongue-in-cheek) ‘office happiness’ dashboard. We used face detection to pick out faces from a webcam feed in real-time and then used deeplearn.js to determine whether they were smiling or not.

It amazes me that this kind of task can be performed directly in the browser! Leveraging the power of the browser provides a lot of advantages over using a cloud service such as the AWS Rekognition API or Google’s Cloud Vision API; mainly it will work out a lot cheaper, but it also means you can get much quicker results, without waiting for an API response.

In this series of posts I will guide you though the steps I took to get the smile detection working using deeplearn.js.

Disclaimer: buckle in, this is going to be a long set of posts. Please be aware that I am by no means a machine learning expert, but I’ve learnt a lot by hacking together this demo. Finally, this isn’t a tutorial on how to architect a good Angular app. The focus is intended to be on deeplearn.js.

We had a lot of fun combing the face detection and smile detection with canvas animations.

Setting up the project

For this demo we’ll be making use of Angular as it works nicely with deeplearn.js because both use Typescript. Angular will be providing the shell for our application but you could just as easily create this demo without the use of a framework.

If you don’t have the Angular CLI already, install it globally with:

Then create a new app and start the development server:

You should now be able to visit the newly created Angular app here: http://localhost:4200/

Real-time face detection

The finished code for this section can be found here: Smiley repository — face_detection branch

The first step in our journey to detecting smiles is to find faces

The first small step in a our deep learning journey is to detect faces from a webcam feed. This is made laughably easy using the proposed Shape detection API. Unfortunately this API isn’t shipped in browsers yet, and to play around with it you’ll need to enable the experimental web features in Google Chrome.

Paste the following into your Chrome address bar and click ‘enable’ on the section titled ‘Experimental Web Platform features’.

After enabling these features you should be able to try this demo: https://realtime-face-detection.netlify.com/

Creating the HTML elements

For the time being we are going to place everything in the root component the Angular cli has created for us. We will refactor later when we need more functionality. Let’s start with the HTML. Add this to ‘src/app/app.component.html’:

Here we add a video element so we can attach a webcam stream to it, and a canvas element. This will be used later so we can draw the video onto the canvas and highlight the detected faces. If you are confused about the # syntax on the elements, this is simply Angular-specific syntax that will allow us to grab a reference to this element from within the Angular component.

Styling the HTML

We can now add some very simple css to ‘src/app/app.component.css’, deleting the HTML the Angular CLI generated for us:

We want the video to fill the entire browser vertically and horizontally, so we need to centre our canvas element in the window and set its min-height and min-width to 100%, to stretch it to whatever aspect ratio the browser might have.

We can actually hide the video as we will be drawing the frames from the video directly to the canvas.

Face detection with Angular

Now we have our HTML and CSS setup we can move on to the actual code in ‘src/app/app.component.ts’:

Not a lot to talk about here; we are simply setting up our Angular component by importing dependencies we will need and defining a few variables.

Notice the @ViewChild definitions. These allow us to get a reference to the canvas and the video elements we created earlier. These return an ElementRef class that is a wrapper for the native HTML element, so in addition we define variables to hold references to the underlying native elements: ‘video’ and ‘canvas’, as well as a variable ‘ctx’ to hold the canvas rendering context.

We also define a variable to hold our face detector, but because this is still an experimental feature there is no built-in type definition yet, so for the time being we will use the ‘any’ type to prevent typescript from complaining.

Finally we inject NgZone into the constructor of our component. I’ll explain why this is required later.

In the code above we implement the component lifecycle hook ngAfterViewInit. This hook fires after the component has loaded and the views have been created. We make use of this hook to set the video, canvas and ctx variables we defined earlier. We cannot do this in the constructor as these elements don’t exist until the view is created.

We also create our face detector here. We pass it a few options to increase performance. ‘fastMode’ tells the face detector to favour speed over accuracy. ‘maxDetectedFaces’ simply limits the number of faces the detector will search for.

We are now ready to initialise the video feed.

First we request access to the webcam with navigator.getUserMedia(). This should show the popup allowing the user to accept or decline access to the webcam.

Assuming this is successful, we then bind the return media stream to our video element. If our video element wasn’t hidden at this point we would see the webcam feed in our video element.

We then wait for the stream to load in our video element, and once it has, we get hold of its width and height, and set our canvas’ width and height to match. This means that once we draw on the canvas it should marry up nicely with the coordinate space of the video. The only downside is that the canvas will be stretched so it won’t look sharp, but for the purpose of this demo that’s cool.

Next we get to the exciting part! We call our update function that will handle detecting faces and marking them on the video. Notice we make use of the NgZone we injected into our component constructor. We tell Angular to run this update function outside of its zone. To cut a long story short, this tells Angular that we don’t want it to perform change detection when this function is called, which will hopefully increase performance. The update function will run repeatedly using requestAnimationFrame and so it could get very expensive if we run it within Angular.

The face detector detect function returns a Promise with the detected faces so we can use async/await to make our code a bit cleaner. This could cause performance issues as it’s within a requestAnimationFrame, but for simplicity we will stick with it.

The first thing we do is detect our faces by passing the face detector our canvas element. This will return an array of the faces it detects, if any.

Next we take the latest frame from the video and draw it to the canvas. This is why we set our video element to display none.

We then call a function called markFaces which will draw any detected faces on the canvas with a rectangle, and once it is done, we request the next animation frame to repeat this process.

All pretty straightforward!

Above is the markFaces function. Here we take the array of faces from the face detector, loop over them and draw a rectangle using the bounding box returned by the face detector.

Working real-time face detection 🎉 😁

That should be it! You should now have a live video with rectangles surrounding up to two faces in the feed. Not bad for under 100 lines of code!

Performance of the face detection seems to be pretty variable, but it’s not bad for an experimental feature.

In the next post we will build upon our working face detection and try to extract and process the faces in a way that makes them digestible by a neural network!

In the meantime, feel free to get in contact with me on Twitter: @jozefmaxted

Part Two can now be found here: