Building a real-time smile detection app with deeplearn.js and the web shape detection API — Part 2: Image Processing

Published in

The Unit

5 min readJan 31, 2018

This is the second post in a series explaining how to achieve real-time smile detection using deeplearn.js.

In this tutorial we are extracting the detected faces, ready to feed them into our neural network

In the previous post we successfully made use of the Shape Detection API to find faces in real-time from a video feed. Now we need extract the faces from the feed, and format them in a way that makes them efficiently processable by a neural network.

This requires a number of steps. We will need to:

Crop the video so we are left with only the detected faces
Resize the cropped face so they are much smaller
Convert the cropped resized image to greyscale
Get the normalised pixel data for the images

Each of these steps will make the process of feeding the image into the neural network much more efficient.

We want to use the smallest image possible so that the number of pixels the networks have to process are at their minimum. This is achieved by cropping the feed to just contain the faces, and scaling it down to 50 x 50 pixels.

You don’t really need colour in an image to determine if someone is smiling. By converting the faces to greyscale we also save a bit of processing power as the 3 values for the red green blue pixels are converted into 1.

Finally we will normalise the pixel values that will eventually be passed into the network. This means instead of having an array of values between 0 and 255, we have an array of values between 0 and 1.

If you have enabled Chrome’s experimental features you can see a demo here: https://face-extraction.netlify.com/

Notice the little grey face in the top left.

The finished code for this section can also be found here: https://github.com/zefman/smiley/tree/feature/extract_faces

If you haven’t read the first post in this series please do so now, otherwise the following will make no sense 🤓

Building a real-time smile detection app with deeplearn.js

For three weeks before Christmas at The Unit I had been experimenting with the unreleased web Shape Detection API and…

medium.com

Cropping the faces and resizing

To crop out and resize the faces we will use an additional smaller canvas. For the time being we will continue to place everything in the App component “src/app/app.component.ts”.

Here you can see the additional canvas we add to the app component. We set it to 50 x 50 pixels, which means our resized image will have 2500 individual pixels. This should be small enough for the neural network to consume.

The rest of the HTML is untouched other than the addition of a few classes.

Moving onto “src/app/app.component.css”:

We will place the smaller canvas above everything in the top left. This canvas doesn’t actually need to be visible, but it will be useful to see the results of our image processing while in development.

Defining variables in the app component

Now on to the the JavaScript itself, in “src/app/app.component.ts” add the following above the constructor:

We need to define another ViewChild to get a reference to the new faceCanvas. We also define two more variables to hold the native canvas element and its rendering context.

Referencing the DOM elements

We then get a reference to the new canvas and its context after the view initialises, just like with the other elements.

Processing the faces

We then create a new function called processFace. This function takes a face object returned by the face detector, uses its bounding area to copy that portion of the larger canvas to our new 50 x 50 px one, and resizes it at the same time. This is achieved by passing the original canvas to the the drawImage function of the faceCtx, along with the area to crop from.

We then grab hold of the the new smaller image’s image data and convert it to greyscale. The image data is originally a 1 dimensional array of values between 0 and 255. 1 pixel is made up of 4 entries in the array e.g. [ 45, 200, 202, 255, 76, 98, 201, 255, 253, 222, 98, 0, … ] (The bold section is one pixel). The first value is the red value for that pixel, the second is the green, the third is the blue, and finally the fourth is the alpha value.

To convert these individual pixels to greyscale we create a for loop that increments by 4 each time, allowing us to modify the individual rgba values of each pixel. We then create a new brightness value that we will use to replace the original pixel values. This is done by summing the values of the rgb parts of the pixel. You might notice the r value is multiplied by .3, the b by .59, and the g by .11. This is to account for the way our eyes are sensitive to different colours and we will produce a more natural-looking greyscale result.

Once we have a set of rgb pixels to their new brightness value, we then place the modified data back onto the canvas with the putImageData function.

Finally we need to use the processFace function in our update to continually draw the detected faces to our small canvas.

You should now have a strange black and white face updating in the top left of your browser! This won’t work well with multiple faces but it’s not a problem as we are only checking the processFace function works.

The final thing we need to do is to get the normalised image data for the face. At the moment if we were to get the image data from the faceCtx we would have an array of values from 0 to 255, and each pixel would have the same value set for its rgb variables. This doesn’t provide the neural network with any additional information, so we may as well take only the r value of each pixel. On top of that we need to normalise the values so they fall between 0 and 1 rather than 0 and 255. To do this we will create two new functions to be used later on.

Hopefully this code makes sense after reading the description above. We will be passing getNormalizedGreyScalePixels the imageData from our faceCtx; we take the r value from each pixel, normalise it, and then add it to a new array.

And we’re done - phew 😅

Before starting this project I hadn’t realised how much effort would go into formatting the data correctly, before we even get to the machine learning element.

So at this point we have successfully taken the detected faces and modified them in a way that will make them more easily processable by the neural network, which we will be creating in a forthcoming post.

In the next post we will look into how we can save labeled training data to teach the neural network to recognise smiling faces. This will involve saving the image data into two sets: one smiling, one not smiling.

In the meantime, feel free to get in contact with me on Twitter: @jozefmaxted