Photo by Steve Gale on Unsplash

Enhanced selfie experience via MLKit

Ashwini Kumar
InCred Technopedia
Published in
8 min readSep 1, 2020

--

At InCred, we are reinventing the lending businesses by emphasising and investing more and more on cutting edge technologies to solve real-world problems. Our aim is to provide the hassle-free lending process with awesome customer experience. This has become more relevant in these tough and challenging times(Covid-19) where lesser the human interactions are, safer the world can be. To have zero paperwork in the process, we introduced customer tasks where users can complete any task like Taking a selfie, Signing the agreement, Submitting documents, NACH registration etc by themselves without requiring any human interaction. Taking a selfie is the most important step in our loan lending business because an unclear picture can lead to the delay or rejection of the loan application. This article will tell about how we are utilising MLKit to improve the Selfie experience of our users.

Face Detection via MLKit

When we started on our journey to improve the Selfie experience, the aim was very simple: Our customers should be able to capture their Selfies clearly for a faster loan processing. So, we utilised the power of machine-learning mainly Face detection MLKit(earlier via Firebase) to solve just that. We went with the unbundled face model approach for MLKit as the bundled one would increase the app size by 16MB. It pulls in the face models lazily when the app is being initialised and opened for the first time.

// Add this in your AndroidManifest.xml to automatically download the face model from Play Store after the app is installed
<meta-data
android:name="com.google.mlkit.vision.DEPENDENCIES"
android:value="face" />

Setting up Camera 📸

The choice of camera library decides the efforts of implementation. Until CameraX becomes stable, we would want to use the library which provides an easy to use APIs that is compatible with different Android OS and then we found CameraView. CameraView is well-documented and powered by Camera1(<API 21) and Camera2(≥API 21) engines making it fast & reliable. It has all the features, we needed, the most important being Frame Processing Support. It also offers support on the LifeCycleOwner where the CameraView handles the LifeCycle event by its own, mainly asking for permissions in onResume, cleaning frame processors & listeners and destroying cameraViewin onDestroy of fragment/activity. By default, CameraView offloads the Frame processing to the background thread so that these frames can then be consumed by the face detector synchronously.

cameraView.apply {
setLifecycleOwner(viewLifecycleOwner)
facing = Facing.FRONT
addCameraListener(cameraListener())
addFrameProcessor {
faceTrackerViewModel.processCameraFrame(it)
cameraOverlay.setCameraInfo(
it.size.height,
it.size.width,
Facing.FRONT
)
}
}

It is important that you set the width and height of cameraOverlay to the received frame’s width and height. While testing initially on different devices, the face detected rectangle went away from where the face actually resides in. More details are present in the issue and how we fixed it.

Reactive Frame Processing for Face Detection 📽️

The frame is then sent to the frame processor via faceTrackerViewModel to detect a face in it.

fun processCameraFrame(it: Frame) {
val byteBuffer = ByteBuffer.wrap(it.getData())
val frameMetadata =
FrameMetadata(it.size.width, it.size.height, it.rotationToUser, Facing.FRONT)
compositeDisposable.add(
rxFaceDetectionProcessor.process(byteBuffer, frameMetadata)
.subscribe()
)
}

All this is done keeping the reactive nature of the app intact via RxFaceDetectionProcessor . The ViewModel will pass on the frame to the processor which will process each frame asynchronously off the UI thread.
RxFaceDetectionProcessor is the reactive layer written over FaceDetectionProcessor that emits face detection results via FaceDetectionResultListenerwhich are then consumed by faceDetectionResultLiveData = LiveData<List<Face>> . This faceDetectionResultLiveData is observed via the view layer via viewModel to display the rectangular bounding box over the face.

class RxFaceDetectionProcessor
@Inject
constructor(private val faceDetectionProcessor: FaceDetectionProcessor) :
FlowableOnSubscribe<List<Face>>,
FaceDetectionResultListener {
private lateinit var emitter: FlowableEmitter<List<Face>>
private lateinit var data: ByteBuffer
private lateinit var frameMetadata: FrameMetadata
private lateinit var faceDetectionResultLiveData: MutableLiveData<List<Face>>

fun setFaceDetectionResultLiveData(faceDetectionResultLiveData: MutableLiveData<List<Face>>) {
this.faceDetectionResultLiveData = faceDetectionResultLiveData
}

fun process(
data: ByteBuffer,
frameMetadata: FrameMetadata
): Flowable<List<Face>> {
this.data = data
this.frameMetadata = frameMetadata
return Flowable.create(this, BackpressureStrategy.LATEST)
}

override fun subscribe(emitter: FlowableEmitter<List<Face>>) {
this.emitter = emitter
faceDetectionProcessor.process(data, frameMetadata, this)
}

override fun onSuccess(
results: List<Face>
) {
faceDetectionResultLiveData.value = results
}

override fun onFailure(e: Exception) {
Timber.d(e)
faceDetectionResultLiveData.value = emptyList()
}

fun stop() {
faceDetectionProcessor.stop()
if (::emitter.isInitialized)
emitter.setDisposable(Disposables.disposed())
}
}

FaceDetectionProcessor is the class where actual frame processing happens. This involves creating the FaceDetector with FaceDetectorOptions to start processing each frame for our use case.

val options = FaceDetectorOptions.Builder()
.apply {
setClassificationMode(FaceDetectorOptions
.CLASSIFICATION_MODE_NONE)
setLandmarkMode(FaceDetectorOptions.LANDMARK_MODE_NONE)
setPerformanceMode(FaceDetectorOptions.PERFORMANCE_MODE_FAST)
enableTracking()
}
.build()
detector = FaceDetection.getClient(options)

We wanted the face detection to be as fast as possible without requiring any extra processing on landmarks or classification. Hence, we went for faster performance by limiting the features and not requiring classification of faces(smiling, eyes open etc) or any landmark detection.

Why are the frames getting processed even if the face detector is closed?
When we started pushing frames for face detection, we came around with one more blocker issue. We found that even if the face detector was closed and the camera view has been destroyed, the frames kept on getting processed thus affecting the battery performance. Anything which hinders the performance, cannot hit production at any cost. So we sat around to fix the issue. We found that we were feeding more frames to thefaceDetector than it could actually consume and since faceDetector took time to compute the result for a particular frame, the next frame was getting processed even if the faceDetector was closed because those were already sent in its buffer. Here is the processing logic after the fix.

class FaceDetectionProcessor
@Inject
constructor() {
private val detector: FaceDetector
private var
latestImage: ByteBuffer? = null
private var
latestImageMetaData: FrameMetadata? = null
private var
processingImage: ByteBuffer? = null
private var
processingMetaData: FrameMetadata? = null

init
{
// Face detector initialisation here
}

fun process(
data: ByteBuffer,
frameMetadata: FrameMetadata,
detectionResultListener: FaceDetectionResultListener
) {
latestImage = data
latestImageMetaData = frameMetadata
// Process the image only when the last frame processing has been completed
if
(processingImage == null && processingMetaData == null) {
processLatestImage(detectionResultListener)
}
}

private fun processLatestImage(detectionResultListener: FaceDetectionResultListener) {
processingImage = latestImage
processingMetaData = latestImageMetaData
latestImage = null
latestImageMetaData = null
if
(processingImage != null && processingMetaData != null) {
processImage(
requireNotNull(processingImage),
requireNotNull(processingMetaData),
detectionResultListener
)
}
}

private fun processImage(
data: ByteBuffer,
frameMetadata: FrameMetadata,
detectionResultListener: FaceDetectionResultListener
) {
detectInVisionImage(
InputImage.fromByteBuffer(
data,
frameMetadata.width,
frameMetadata.height,
frameMetadata.rotation,
InputImage.IMAGE_FORMAT_NV21
),
detectionResultListener
)
}

private fun detectInVisionImage(
image: InputImage,
detectionResultListener: FaceDetectionResultListener
) {
detector.process(image)
.addOnSuccessListener {
detectionResultListener.onSuccess(it)
}
.addOnFailureListener {
detectionResultListener.onFailure(it)
}.addOnCompleteListener {
// Process the next available frame for face detection
processLatestImage(detectionResultListener)
}
}

fun stop() {
try {
Timber.d("Face detector closed")
detector.close()
} catch (e: IOException) {
Timber.e("Exception thrown while trying to close Face Detector: $e")
}
}
}
Amit Randhawa from InCred’s Android team showing the old selfie experience

Quality Selfies via Auto Capture 🤳

We launched our newly developed selfie experience to our users which received very good feedback. At InCred, we always aim to make good, better and then strive for the best. So, we completely enhanced the selfie experience by introducing auto-capture once the face is detected with an utmost quality snapshot. We provided an oval overlay on top of the camera feed and asked our users to have their faces inside it. Once the face was detected, we would auto-capture the selfie. To guide the users properly, we provided realtime feedbacks like(You are too near to the camera/ You look too far from the camera etc.) on top of overlay so that our users could themselves do this task without requiring any manual support.

Detecting face inside the oval and the feedback

We added an oval overlay on top of the GraphicOverlay. Note that GraphicOverlay helped earlier in adding the rectangular bounding box graphic once the face is detected. We were looking at 3 kinds of feedback to the users:

  • Face inside oval: To detect whether the detected face was actually inside the oval, it was just about checking whether the face bounding box is inside the oval dimensions. Here this is the oval dimensions and other params are for the face bounding box.
fun sidesInsideOval(top: Float, right: Float, bottom: Float, left: Float): Boolean {
if (top >= this.top && bottom <= this.bottom && left >= this.left && right <= this.right)
return true
return false
}
  • Face inside oval but zoomed out: Face of the user could be inside the oval but if it too far from the camera, selfies captured would not be clear. To verify this case, we compared the vertical distance of the face bounding box with the oval. If it was lesser than half, we gave the feedback to the user to move closer to the camera.
fun isFaceZoomedOut(top: Float, bottom: Float): Boolean {
if (((bottom - top) / (this.bottom - this.top)) <= 0.5)
return true
return false
}
  • Face zoomed in: User could be holding the camera too near to the face which could bring in sub-quality selfies. To verify this, we checked whether any of the coordinates of the face bounding box is greater than oval dimension or not. If the face was zoomed in, we gave the feedback to the user to move near to the camera.
fun isFaceZoomedIn(top: Float, right: Float, bottom: Float, left: Float): Boolean {
var sidesInside = 0
if (top >= this.top)
sidesInside += 1
if (bottom <= this.bottom)
sidesInside += 1
if (left >= this.left)
sidesInside += 1
if (right <= this.right)
sidesInside += 1
if (sidesInside <= 1)
return true
return false
}

Once the corner cases have been tested, we experimented our new selfie experience in-conjunction with the old one via Firebase A/B testing. Within a month we found that the oval-overlay with the auto-capture feature was performing way better than the former one. This provided confidence in our new feature and then we launched it to all our users.

Amit Randhawa from InCred’s Android team showing the new Selfie Experience

The fallback approach

Android is heavily fragmented and with different OEMs, it is more and more difficult to test your feature on all Android devices. This becomes more problematic in a country like India where there is a vast majority of OEMs that customise the behaviour of Android OS according to their needs and requirements. This means that if your feature does not work on some devices, you are basically blocking the user journey in your app. To solve for those users, we provided a fallback approach where-in if a face was not detected in 10s, we would move to the native camera experience. With this, we were not blocking the user’s loan application journey at any cost while giving us time to fix the issues in the background.

Always provide the fallback approach for your feature. At the end, you would never want to block your users and then receive bad reviews on Play Store.

We truly believe that Necessity is the mother of invention. The current tough times are pushing us to re-imagine and build every feature without requiring any human intervention. We are on our path to not only rebuild/revamp our tech stack but also solidify and improve the features which can boost the user experience. If this excites you, we will be more than happy to discuss. Check out the open positions here or reach out to me via Twitter or LinkedIn.

--

--