An updated version is available in the following article:
Face Recognition and Detection on iOS Using Native Swift Code, Core ML, and ARKit
Leveraging the native Swift library to perform face recognition and detection in an iOS app
Create a Single View Application
To begin, we need to create an iOS project with a single view app:
Now that you have your project, and since I don’t like using storyboards, the app is done programmatically which means no buttons or switches to toggle, just pure code 🤗.
You have to delete
main.storyboard and set your
AppDelegate.swift file like so:
Make sure to remove the storyboard “Main” from the deployment info.
Create Your Scene and Add It to the Subview
We only have one ViewController, which will be our main entry point for the application.
At this stage, we need to import ARKit and instantiate an
ARSCNView that automatically renders the live video feed from the device camera as the scene background. It also automatically moves its SceneKit camera to match the real-world movement of the device, which means that we don’t need an anchor to track positions of objects we add to the scene.
We need to give it the screen bounds so that the camera session takes the whole screen:
ViewDidLoad method, we are going to set up a few things, such as the delegate, and we also need to see the frame statistics in order to monitor frame drops:
Now we need to start a session with an
ARFaceTrackingConfiguration, this configuration gives us access to front-facing TrueDepth camera that is only available for iPhone X, Xs and Xr. Here’s a more detailed explanation from Apple Documentation:
A face tracking configuration detects the user’s face in view of the device’s front-facing camera. When running this configuration, an AR session detects the user’s face (if visible in the front-facing camera image) and adds to its list of anchors an ARFaceAnchor object representing the face. Each face anchor provides information about the face’s position and orientation, its topology, and features that describe facial expressions.
The ViewDidLoad method should look like this:
Train a Face recognition model
There are multiple ways to create a .mlmodel file that is compatible with CoreML these are the common one:
- Turicreate: it’s python library that simplifies the development of custom machine learning models, and more importantly you can export your model into a .mlmodel file that can be parsed by Xcode.
- MLImageClassifierBuilder(): it’s a build-in solution available out of the box with Xcode that gives access to pretty much a drag and drop interface to train a relatively simple model.
I have created multiple models to test both solutions, since I don’t have a big dataset, I decided to use MLImageClassifierBuilder() with a set of 67 images that are ‘Omar MHAIMDAT’ (which is my name) and a set of 261 faces of ‘Unknown’ that I found on unsplash.
Open playground and write this code:
I would recommend setting the max iterations at 20 and add a Crop Augmentation which will add 4 instances of cropped images for each image.
Capture Camera Frames and Inject Them Into the Model
We need to extend our ViewController with the scene delegate,
ARSCNViewDelegate. We need two delegate methods, one to set the face detection and the other one to update our scene when a face detected:
Unfortunately, the scene doesn’t update when I open my eyes or mouth. In this case, we need to update the scene accordingly.
Update the scene:
We take the whole face geometry and mapping, and we update the node.
Get the camera frames:
This gets interesting because
ARSCNView inherits from
AVCaptureSession, meaning we can get a
cvPixelFuffer that we can feed our model.
Here’s the easy way to get it from our
Inject camera frames into the model:
Now that we can detect a face and have every camera frame, we are ready to feed our model some content:
Show the Name Above the Recognized Face
The last and probably the most frustrating part is to project a 3D text above the recognized face. If you think about it, our configuration is not as powerful as the
ARWorldTrackingConfiguration which gives access to numerous methods and classes. We are instead using the front-facing camera and very few things can be achieved.
Nevertheless, we can still project a 3D text on the screen, though it won’t track the face movement and change accordingly.
Now that we have the SCNText object, we need to update it with the corresponding face and add it to the
Here’s the final result with the face detection and recognition.
If you liked this piece, please clap and share it with your friends. If you have any questions don’t hesitate to send me an email at email@example.com.
This project is available to download from my Github account