GitHub All-Stars #2: face-api.js
Welcome in the second edition of GitHub All-Stars. Last week, we had the opportunity to analyse bot created for Saliens — web game created as a support to Steam Summer Sale. Now, we’ll try to take a look at something much “heavier” and far more challenging. Once again, we will try to do codebase analysis, this time focusing on justadudewhohacks/face-api.js — a great example of how to use TensorFlow in a web application.
Disclaimer: I assume a bit of knowledge with basic TensorFlow terminology: if you don’t know what Tensors are please watch this movie. After watching, you should be ok to carry on.
Deep Learning is a definitely cool thing, probably majority of you will agree with me. I’m interested in Tensor Flow for a longer period of time. I even experimented a bit with it myself, trying to hack an app with TensorFlow Mobile (currently called TensorFlow lite). Still, I didn’t really spend enough time to become a mediocre user of this powerful Deep Learning Library from Google — that’s why reading through the code of the real project, which wasn’t just a sample usage, was very challenging, but also fascinating. I learned a lot just preparing this blog post and I hope you will have exactly the same feeling after reading it.
Before we start our review, let’s describe what target of our analysis really is. face-api.js is a library that uses CNNs (Convolutional Neural Networks) for detecting (and matching) faces on the picture. It is descendant of face-recognition.js — this time using previously mentioned, recently released tensorflow.js. It’s worth to mention that the library has a great tutorial written by its creator, describing possible use-cases— I highly suggest reading it. But what’s especially helpful, is great set of examples. Opening them brings a lot of wonderful use cases (all of them are Big Bang Theory-themed — I stopped watching this show years ago, but still have some fond memories). I cannot say I wasn’t impressed by the number of samples. There is a quite basic face detection, both on a static image as well as on video. There is finding matches of specific faces. There is also ability to draw landmark of how computer see the face.
To better understand library itself, I checked what can be found in the source code of examples. I’ve chosen the simplest one — face detection — as it’s the least complex while containing everything that is interesting in the context of that analysis — integration with TensorFlow.
As an entrance, I started with a file examples/views/faceDetection.html. Apart of glue code, there are two interesting functions. First of them — run — is executed at the start of the application.
It’s loading models (faceapi.loadFaceDetectionModel) and setting listener on an event of change of picture (onSelectionChanged), which itself is not that interesting apart of fact that is triggering recalculation of the results:
There, really interesting things happen:
Besides some canvas manipulation, we have two functions from API triggered — first (faceapi.locateFaces) is (surprisingly…) locating faces, second (faceapi.drawDetection) put results on the canvas created over the image we are looking for faces on. In the following part of this blog post, we will take a look at three mentioned functions from faceapi and how they were implemented.
Let’s take a look into the library itself.
The first observation — a whole library is written in TypeScript. The second note — around 90Mb of models needs to be downloaded, which are stored in weights directory. Another thing worth mentioning is that library has only singular production dependency: TensorFlow.js itself.
Going deeper, main API for the whole module, which is stored in globalApi.ts, is very short and concise — four methods for loading models for specific kinds of computations, three more for that calculation. Like I wrote previously, we’ll focus only on those connected to Face Detection process.
Method globalApi.loadFaceDetectionModel is just a proxy to load method on instance of FaceDetectionNet. We need to go deeper.
There comes the first surprise and interested finding — a variable that looked like simple URL to models directory can be something more than just a String. It can be also Float32Array. Strong typing is preserved with Union Types — very powerful construct that allows creating complex types which have multiple identities. While not fully embraced in TypeSafe like in languages like ex. Ceylon — we still need to use instanceof operator — they are a good compromise of the flexibility and safeness.
As no model was explicitly passed in our function, default ones will be loaded. Both manifest and model are retrieved using loadWeightMap Function. Additionally, models are sharded for better utilization of browser cache. As a result, face_detection_model-shard1, face_detection_model-shard2, and face_detection_model-weights_manifest.json are loaded to the browser. That’s also the first usage of Tensor Flow API — tf.io.loadWeights— it’s used to read models data and populate params. There are also some basic validations there — we can observe that our weigths needs to be in a Tensor of specific dimension.
Now, when we now how our model is populated, we will go to locateFaces method.
Once again, we have union type, with support to three different types of input. First of them is Tensor from tf.Tensor Library, NetInput or TNetInput. TNetInput is used in analysed example — as we are using media input, in our case image, it will be converted to the NetInput.
Three types of elements are supported there — HTMLImageElement, HTMLVideoElement and HTMLCanvasElement:
After they are loaded, they are converted to Canvas object in constructor of NetInput:
Now, when we have NetInput instance, we can go forward.
In the further steps, NetInput need to be processed and converted to Tensor. For this, getImageTensor function is used:
In the further steps, NetInput needs to be processed and converted to Tensor. For this, getImageTensor function is used.
There, we can find more usages of TensorFlow.js — whole execution is orchestrated by tf.tidy, which is controlling memory access and cleans intermediary tensors, created while final one was calculated.
Inside that closure, if our input is a Tensor, some validation of tensor rank (it’s dimension) are done. As we see, only specific ranks are allowed. We will see further why.
In case of NetInput, we are converting it to Tensor using tf.fromPixels. Finally, we are concatenating the Tensor created from canvas to receive an instance of tf.Tensor4D - Tensor with four dimensions — rows, columns, depth, and depth2.
Disclaimer: Tensors have values, shape (how many values are in each dimension of the array) and data type — we will be specifically interested in shape in further parts.
Now, when we finally have our tensors connected to images, there is time to find faces on them. The whole magic starts — we need to retrieve shape from the tensor and slice it to the smaller Tensors.
More of mentioned magic happens in padToSquare function — we need to create prepending and appending Tensors, to give original tensor “squared” shape, which will be needed in our model.
Now, as we have Tensors in the specific shape, we are ready to do a real work — our Tensor is firstly resized using bilinear interpolation algorithm (that’s why we needed Tensor4D instance in “squared” shape) to receive not only particular shape, but also size. Finally, we are able to process it by mobileNetV1 — hearth in our application — Deep Learning layer that is specialised in looking for features on images. Following receiving them, predictionLayer (in complex process) is calculated. Ultimately, outputLayer is created from predictionLayer.
Data are calculated — and if prediction fulfills the given threshold, there are returned to the user. The easy steps come now — every FaceDetection object contains parameters which are used with standard Canvas API to draw over the image.
Nothing unusual happens there apart of quite nicely executed error handling. Now we can be happy with final results — we are able to see boxes drawn over faces, which proofs our code is working fine.
In comparison to Saliens, face-api.js was far more complex and it needed a lot of digging into Tensor Flow documentation. justadudewhohacks guy did a lot of work to use those API to receive the impressive final result. I’m sure many different projects will use this library to create spectacular results (I have my own idea of merging it with a web camera, snapchat-style). Like I wrote before, I learned a lot and while I wouldn’t be able to write a copy of this tool functionality from scratch, It helped me understand few fascinating parts of Tensor Flow.
I hope you liked this short analysis of one of the supported flows. Please, leave comments there and subscribe to be informed about every new Github All-Stars episodes.