Implementing a face detection feature with ARKit and face recognition with CoreML model

Omar M’Haimdat
Jul 28 · 4 min read

Create a Single View Application

To begin, we need to create an iOS project with a single view app:

Create a single view app

Now that you have your project, and since I don’t like using storyboards, the app is done programmatically which means no buttons or switches to toggle, just pure code 🤗.

You have to delete main.storyboard and set your AppDelegate.swift file like so:

AppDelegate.swift

Make sure to remove the storyboard “Main” from the deployment info.


Create Your Scene and Add It to the Subview

We only have one ViewController, which will be our main entry point for the application.

At this stage, we need to import ARKit and instantiate an ARSCNView that automatically renders the live video feed from the device camera as the scene background. It also automatically moves its SceneKit camera to match the real-world movement of the device, which means that we don’t need an anchor to track positions of objects we add to the scene.

We need to give it the screen bounds so that the camera session takes the whole screen:

Instantiate ARSCNView

In the ViewDidLoad method, we are going to set up a few things, such as the delegate, and we also need to see the frame statistics in order to monitor frame drops:

Setting the scene in the ViewDidLoad method

Start an ARFaceTrackingConfiguration session

Now we need to start a session with an ARFaceTrackingConfiguration, this configuration gives us access to front-facing TrueDepth camera that is only available for iPhone X, Xs and Xr. Here’s a more detailed explanation from Apple Documentation:

A face tracking configuration detects the user’s face in view of the device’s front-facing camera. When running this configuration, an AR session detects the user’s face (if visible in the front-facing camera image) and adds to its list of anchors an ARFaceAnchor object representing the face. Each face anchor provides information about the face’s position and orientation, its topology, and features that describe facial expressions.

Source: Apple

The ViewDidLoad method should look like this:

ViewDidLoad() method

Train a Face recognition model

There are multiple ways to create a .mlmodel file that is compatible with CoreML these are the common one:

  1. Turicreate: it’s python library that simplifies the development of custom machine learning models, and more importantly you can export your model into a .mlmodel file that can be parsed by Xcode.
  2. MLImageClassifierBuilder(): it’s a build-in solution available out of the box with Xcode that gives access to pretty much a drag and drop interface to train a relatively simple model.
MLImageClassifierBuilder

I have created multiple models to test both solutions, since I don’t have a big dataset, I decided to use MLImageClassifierBuilder() with a set of 67 images that are ‘Omar MHAIMDAT’ (which is my name) and a set of 261 faces of ‘Unknown’ that I found on unsplash.

Open playground and write this code:

MLImageClassifierBuilder

I would recommend setting the max iterations at 20 and add a Crop Augmentation which will add 4 instances of cropped images for each image.


Capture Camera Frames and Inject Them Into the Model

We need to extend our ViewController with the scene delegate, ARSCNViewDelegate. We need two delegate methods, one to set the face detection and the other one to update our scene when a face detected:

Face detection:

Face detection

Unfortunately, the scene doesn’t update when I open my eyes or mouth. In this case, we need to update the scene accordingly.

Update the scene:

We take the whole face geometry and mapping, and we update the node.

Get the camera frames:

This gets interesting because ARSCNView inherits from AVCaptureSession, meaning we can get a cvPixelFuffer that we can feed our model.

Here’s the easy way to get it from our sceneView attribute:

Inject camera frames into the model:

Now that we can detect a face and have every camera frame, we are ready to feed our model some content:

didUpdate Renderer

Show the Name Above the Recognized Face

The last and probably the most frustrating part is to project a 3D text above the recognized face. If you think about it, our configuration is not as powerful as the ARWorldTrackingConfiguration which gives access to numerous methods and classes. We are instead using the front-facing camera and very few things can be achieved.

Nevertheless, we can still project a 3D text on the screen, though it won’t track the face movement and change accordingly.

Instantiate SCNText

Now that we have the SCNText object, we need to update it with the corresponding face and add it to the rootNode:

Update the scene with the name associated with the face

Final Result:

Here’s the final result with the face detection and recognition.

If you liked this piece, please clap and share it with your friends. If you have any questions don’t hesitate to send me an email at omarmhaimdat@gmail.com.

This project is available to download from my Github account

Download the project

Better Programming

Advice for programmers.

Omar M’Haimdat

Written by

Software Engineering Student.

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade