Firebase ML Kit: Building A Facial Gesture Detecting App In iOS (Part Two)

Amit Palo
The Startup
Published in
7 min readJun 8, 2020

This article demonstrates the detection of different facial gestures (Head Nods, Eye Blinks, Smile etc) with the help of Firebase ML Kit Face Detection API. Here we will be mainly focusing on the use of Firebase ML Kit Vision API to detect different facial gestures. For initial setup of the project you can visit the Part One of this series.

Face Detection Using ML Kit

With ML Kit’s face detection API, you can detect faces in an image, identify key facial features, and get the contours of detected faces.

With face detection, you can get the information you need to perform tasks like embellishing selfies and portraits, or generating avatars from a user’s photo. Because ML Kit can perform face detection in real time, you can use it in applications like video chat or games that respond to the player’s expressions.

You can know more about Firebase Face Detection API by clicking the link here.

Let’s not waste anymore time and get started.

Tutorial

In the Part One of the series we have completed the initial setup of the application. In this article we will see how to make use of Firebase MLVision API to detect different facial gestures.

Creating a FacialGestureCameraView.swift File

  1. Let’s create a “FacialGestureCameraView.swift” file which is a subclass of UIView class and let’s import the below frameworks in the header of the file.
import AVFoundationimport FirebaseMLVision

2. Then let’s create the below threshold variables to determine different facial gestures.

public var leftNodThreshold: CGFloat = 20.0public var rightNodThreshold: CGFloat = -4public var smileProbality: CGFloat = 0.8public var openEyeMaxProbability: CGFloat = 0.95public var openEyeMinProbability: CGFloat = 0.1private var restingFace: Bool = true

There is no need to explain about this variables as it is self explanatory.

3. Let’s create few more lazy variables which will be used for computing facial gestures as shown below.

private lazy var vision: Vision = {return Vision.vision()}()private lazy var options: VisionFaceDetectorOptions = {let option = VisionFaceDetectorOptions()option.performanceMode = .accurateoption.landmarkMode = .noneoption.classificationMode = .alloption.isTrackingEnabled = falseoption.contourMode = .nonereturn option}()private lazy var videoDataOutput: AVCaptureVideoDataOutput = {let videoOutput = AVCaptureVideoDataOutput()videoOutput.alwaysDiscardsLateVideoFrames = truevideoOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue)videoOutput.connection(with: .video)?.isEnabled = truereturn videoOutput}()private let videoDataOutputQueue: DispatchQueue = DispatchQueue(label: Constants.videoDataOutputQueue)private lazy var previewLayer: AVCaptureVideoPreviewLayer = {let layer = AVCaptureVideoPreviewLayer(session: session)layer.videoGravity = .resizeAspectFillreturn layer}()private let captureDevice: AVCaptureDevice? = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .front)private lazy var session: AVCaptureSession = {return AVCaptureSession()}()

4. Now let’s write logic to begin and end the session as shown below.

func beginSession() {guard let captureDevice = captureDevice else { return }guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice) else { return }if session.canAddInput(deviceInput) {session.addInput(deviceInput)}if session.canAddOutput(videoDataOutput) {session.addOutput(videoDataOutput)}layer.masksToBounds = truelayer.addSublayer(previewLayer)previewLayer.frame = boundssession.startRunning()}func stopSession() {session.stopRunning() }

5. Now let’s implement “AVCaptureVideoDataOutputSampleBufferDelegate” delegate method and it’s dependent methods as shown below.

public func captureOutput(_ output: AVCaptureOutput,didOutput sampleBuffer: CMSampleBuffer,from connection: AVCaptureConnection) {guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {print("Failed to get image buffer from sample buffer.")return}let visionImage = VisionImage(buffer: sampleBuffer)let metadata = VisionImageMetadata()let visionOrientation = visionImageOrientation(from: imageOrientation())metadata.orientation = visionOrientationvisionImage.metadata = metadatalet imageWidth = CGFloat(CVPixelBufferGetWidth(imageBuffer))let imageHeight = CGFloat(CVPixelBufferGetHeight(imageBuffer))DispatchQueue.global().async {self.detectFacesOnDevice(in: visionImage,width: imageWidth,height: imageHeight)}}private func visionImageOrientation(from imageOrientation: UIImage.Orientation) ->VisionDetectorImageOrientation {switch imageOrientation {case .up:return .topLeftcase .down:return .bottomRightcase .left:return .leftBottomcase .right:return .rightTopcase .upMirrored:return .topRightcase .downMirrored:return .bottomLeftcase .leftMirrored:return .leftTopcase .rightMirrored:return .rightBottom@unknown default:fatalError()}}private func imageOrientation(fromDevicePosition devicePosition: AVCaptureDevice.Position = .front) -> UIImage.Orientation {var deviceOrientation = UIDevice.current.orientationif deviceOrientation == .faceDown ||deviceOrientation == .faceUp ||deviceOrientation == .unknown {deviceOrientation = currentUIOrientation()}switch deviceOrientation {case .portrait:return devicePosition == .front ? .leftMirrored : .rightcase .landscapeLeft:return devicePosition == .front ? .downMirrored : .upcase .portraitUpsideDown:return devicePosition == .front ? .rightMirrored : .leftcase .landscapeRight:return devicePosition == .front ? .upMirrored : .downcase .faceDown, .faceUp, .unknown:return .up@unknown default:fatalError()}}private func currentUIOrientation() -> UIDeviceOrientation {let deviceOrientation = { () -> UIDeviceOrientation inswitch UIApplication.shared.windows.first?.windowScene?.interfaceOrientation {case .landscapeLeft:return .landscapeRightcase .landscapeRight:return .landscapeLeftcase .portraitUpsideDown:return .portraitUpsideDowncase .portrait, .unknown, .none:return .portrait@unknown default:fatalError()}}guard Thread.isMainThread else {var currentOrientation: UIDeviceOrientation = .portraitDispatchQueue.main.sync {currentOrientation = deviceOrientation()}return currentOrientation}return deviceOrientation()}

6. Now let’s create delegates which will be triggered when a particular gesture will be detected as shown below.

@objc public protocol FacialGestureCameraViewDelegate: class {@objc optional func doubleEyeBlinkDetected()@objc optional func smileDetected()@objc optional func nodLeftDetected()@objc optional func nodRightDetected()@objc optional func leftEyeBlinkDetected()@objc optional func rightEyeBlinkDetected()}

7. Now let’s create a “delegate” object in the “FacialGestureCameraView” class which needs to be confirmed to implement the delegate methods as shown below.

public weak var delegate: FacialGestureCameraViewDelegate?

8. Now let’s write the most important method where the face gesture detection logic has been implemented.

private func detectFacesOnDevice(in image: VisionImage, width: CGFloat, height: CGFloat) {let faceDetector = vision.faceDetector(options: options)faceDetector.process(image, completion: { features, error inif let error = error {print(error.localizedDescription)return}guard error == nil, let features = features, !features.isEmpty else {return}if let face = features.first {let leftEyeOpenProbability = face.leftEyeOpenProbabilitylet rightEyeOpenProbability = face.rightEyeOpenProbability// left head nodif face.headEulerAngleZ > self.leftNodThreshold {if self.restingFace {self.restingFace = falseself.delegate?.nodLeftDetected?()}} else if face.headEulerAngleZ < self.rightNodThreshold {//Right head tiltif self.restingFace {self.restingFace = falseself.delegate?.nodRightDetected?()}} else if leftEyeOpenProbability > self.openEyeMaxProbability &&rightEyeOpenProbability < self.openEyeMinProbability {// Right Eye Blinkif self.restingFace {self.restingFace = falseself.delegate?.rightEyeBlinkDetected?()}} else if rightEyeOpenProbability > self.openEyeMaxProbability &&leftEyeOpenProbability < self.openEyeMinProbability {// Left Eye Blinkif self.restingFace {self.restingFace = falseself.delegate?.leftEyeBlinkDetected?()}} else if face.smilingProbability > self.smileProbality {// smile detectedif self.restingFace {self.restingFace = falseself.delegate?.smileDetected?()}} else if leftEyeOpenProbability < self.openEyeMinProbability && rightEyeOpenProbability < self.openEyeMinProbability {// full/both eye blinkif self.restingFace {self.restingFace = falseself.delegate?.doubleEyeBlinkDetected?()}} else {// Face got resetedself.restingFace = true}}})}

I know the article is getting lengthy, but we are almost done with our logic. The only thing which is pending is to implement the delegate methods from our “ViewController.swift” class. Let’s implement that as well.

Implementing Logic In ViewController.swift File

  1. In this file we need to implement FacialGestureCameraViewDelegate methods so that we will receive callbacks when a particular facial gesture is detected. Create an extension of ViewController and implement the delegate methods as shown below.
extension ViewController: FacialGestureCameraViewDelegate {func doubleEyeBlinkDetected() {print("Double Eye Blink Detected")}func smileDetected() {print("Smile Detected")}func nodLeftDetected() {print("Nod Left Detected")}func nodRightDetected() {print("Nod Right Detected")}func leftEyeBlinkDetected() {print("Left Eye Blink Detected")}func rightEyeBlinkDetected() {print("Right Eye Blink Detected")}}

2. Add the remaining code in the “ViewController.swift” file which is used to start the camera session and confirms to the “FacialGestureCameraViewDelegate” methods.

class ViewController: UIViewController {override func viewDidLoad() {super.viewDidLoad()// Do any additional setup after loading the view.addCameraViewDelegate()}override func viewDidAppear(_ animated: Bool) {super.viewDidAppear(animated)startGestureDetection()}override func viewDidDisappear(_ animated: Bool) {super.viewDidDisappear(animated)stopGestureDetection()}}extension ViewController {func addCameraViewDelegate() {cameraView.delegate = self}func startGestureDetection() {cameraView.beginSession()}func stopGestureDetection() {cameraView.stopSession()}}

3. Then we need to create a IBOutLet of “FacialGestureCameraView” in our view controller. In order to do that we need to first add a view in ViewController’s “Main.storyboard” file and assign the class as “FacialGestureCameraView” as shown below.

4. Once done create an IBOutlet in “ViewController.swift” file as shown below.

@IBOutlet weak var cameraView: FacialGestureCameraView!

Awesome. We are finally done with the implementation of our delegate methods which will be triggered when a particular face gesture is detected.

5. Now run the code and check if the delegate methods are getting triggered and if it runs successfully you will see the outputs getting printed in the console.

Conclusion

In this article we have made use of the Firebase ML Kit Vision API and have also implemented our custom delegate methods which gets triggered when a particular face gesture is detected. In the Part Three of the series we will learn how to make use of this delegate methods to implement some of the use cases.

The source code for this tutorial can be found here and don’t forget to run “pod install” command before building the project.

If you finds these useful, feel free to share this. Thanks for reading!

Till Then

Image Credit: https://keepcalms.com/p/keep-learning-and-happy-coding/

--

--