Quick Look: Face detection on Android using ML Kit

Published in

Atlas

4 min readJun 29, 2018

ML Kit’s face detection SDK allows you to detect faces in an image and provide information about landmarks or other features in the image.

High-Level Capabilities:

Locate facial features (eyes, ears, cheeks, nose and mouth)
Facial expressions (probability that the user is smiling or has their eyes closed)
Track faces in a video (Each “Face” detected has a unique identifier)

Setting up a project

Add Firebase to your app — You may have done this already if you’re using another Firebase component
Add the dependency in your app build.gradle (implementation ‘com.google.firebase:firebase-ml-vision:16.0.0’)

That’s all that’s needed to start using ML Kit API’s!

Local Inference

ML Kit allows you to run local, on-device inference on images without downloading the machine learning model.

By default ML Kit downloads the model that is being used for local inference the first time you run the detector. So if you actually tried using the SDK immediately you might have noticed that you get an error along the lines of “The model hasn’t downloaded yet”.

Google recommends you configure your app to download a model as soon as your app is installed from the Play Store by adding the following the meta-data to your AndroidManifest.xml file:

<application ...>
  <meta-data
      android:name="com.google.firebase.ml.vision.DEPENDENCIES"
      android:value="face" />
</application>

android:value=”face” means it will download the Face detect model. You can change this or include multiple models

Using the SDK

The library is straight forward and easy to use and the classes you will interacting with the most is the FirebaseVisionFaceDetector and the FirebaseVisionImage.

Main Use Case:

Build a FirebaseVisionFaceDetector (detector)
Convert your image into a FirebaseVisionImage (visionImage)
Use the detector to get the result/List<FirebaseVisionFace>(visionFace)

val detector = getDetector()
val visionImage = getVisionImage()
val visionFaces = detector.detectInImage(visionImage)

Build a FirebaseVisionFaceDetector

The simplest and quickest way to get a detector is to get a detector that uses the defaults for all the options.

val detector = FirebaseVision.getInstance().getVisionFaceDetector()

You could also customize the detector to detect different things or to opt-in or out of things that fit your specific use case by supplying a FirebaseVisionFaceDetectorOptions.

val options = FirebaseVisionFaceDetectorOptions.Builder()
    .setModeType(ACCURATE_MODE)
    .setLandmarkType(ALL_LANDMARKS)
    .setClassificationType(ALL_CLASSIFICATIONS)
    .setMinFaceSize(0.15f)
    .setTrackingEnabled(true)
    .build()val detector = FirebaseVision.getInstance().getVisionFaceDetector(options)

Most of these options are self-explanatory, but we’ll look at a few of the common options( the rest can be found in the well-written docs).

modeType- FAST_MODE or ACCURATE_MODE (trade off between speed and accuracy)
landmarkType- NO_LANDMARKS or ALL_LANDMARKS (location of eyes,nose, etc)
classificationType- NO_CLASSIFICATIONS or ALL_CLASSIFICATIONS (probability of smiling/eyes open)

Now that we have a detector (default or customized), let’s check out all of its public API.

As you can see, the API is straight forward. You can either detectInImage or close. To be able to detect we need a FirebaseVisionImage.

Convert an image to a FirebaseVisionImage

The SDK provides some functions used to transform most image formats you have/deal with into a FirebaseVisionImage.

Bitmap FirebaseVisionImage.fromBitmap(bitmap)
media.image FirebaseVisionImage.fromMediaImage(mediaImage, rotation)
Byte buffer/array FirebaseVisionImage.fromByteBuffer(buffer, metadata)
File FirebaseVisionImage.fromFilePath(context, uri)

Each one is handled slightly different. For example, media.image, which is used to get images from the device’s camera, has to worry about the specific rotation that is used and rotation is specific to the device.

Get detected FirebaseVisionFace(s)

So we now have a detector (whether or not you customized it) and we’ve converted our image into a format that the detector understands. All that’s left is to get the result. So, let’s detect!

val result: Task<MutableList<FirebaseVisionFace>> = detectInImage(firebaseVisionImage)
.addOnSuccessListener {faces: MutableList<FirebaseVisionFace> -> //}
.addOnFailureListener { e: Exception -> //}

The failure listener is where you would get the “model hasn’t finished downloading” error if … the model hasn’t finished downloading

A couple things to note:

the detect method returns a Task and
it detects multiple faces.

A task is a common Google API that represents an asynchronous operation.

Let’s look at a couple ways to pull data out of a face.

onFace(face: FirebaseVisionFace) {
    face.boundingBox // Rect
    face.headEulerAngleY // head tilt
    face.headEulerAngleZ // head tilt

// classifications (setClassificationType(ALL_CLASSIFICATIONS))
    face.leftEyeOpenProbability
    face.rightEyeOpenProbability
    face.smilingProbability

// tracking (options.setTrackingEnabled(true))
    face.trackingId

// landmarks (options.setLandmarkType(ALL_LANDMARKS))
    face.getLandmark(Landmark.BOTTOM_MOUTH) // other landmarks
}

Summary

ML Kit’s face detection API is an extremely simple but has the potential to create powerful and never before seen applications.

A use case that naturally flows from being able to detect a face in an image is the ability to crop those images to just include a user’s face. Although this was previously possible I don’t think it has ever been as easy to implement for any app developer.

Another feature that could be implemented with this SDK is the ability to overlay 3D models or UI elements over certain sections (landmarks) of faces detected.

Given this simple SDK and these use cases I think it’s easy to see how these can be implemented in an intuitive way.

The only thing that seems a little bit difficult right now is detecting faces in a video, but even this becomes slightly easier by using the CameraSource class from the vision API which provides helpful abstractions used to plug in a detector and get a stream of frames.

cameraSource = CameraSource.Builder(this, detector)
    .setRequestedPreviewSize(640, 480)
    .setFacing(CameraSource.CAMERA_FACING_BACK)
    .setRequestedFps(24.0f)
    .build()

CameraSource is actually a class from the now deprecated Vision API (predecessor to ML Kit) but it still is possible to wrap ML Kit’s detector in a detector class that is compatible.