Exploring Text Recognition and Face Detection with Google’s ML Kit for Firebase on iOS

Google’s recent introduction of the ML Kit for Firebase beta at Google I/O 2018 brings on-device and in-the-cloud machine learning to their mobile development platform, Firebase. ML Kit for Firebase includes “production-ready” support for several common use cases:

  • Recognizing text
  • Detecting faces
  • Scanning barcodes
  • Labeling images
  • Recognizing landmarks

Landmark recognition is available exclusively from the cloud API, while text recognition and image labeling are available on device or from the cloud, and face detection and barcode scanning are only available on device. On-device use cases may be used free of charge, while cloud-based use cases will incur a cost when the number of uses is above a threshold.

In this article, we will explore ML Kit for Firebase by building two small proof of concept iOS apps (ML Kit is available for both Android and iOS):

  1. Face Replace: Uses face detection to find faces in an image and superimpose an emoji which represents how much each person is smiling.
  2. Credit Card Scanner: Uses text recognition to extract the credit card number, the credit card holder’s name and the expiration date from the front of a credit card.

Face Replace

Our Face Replace app will begin by capturing an image which includes one or more faces. We’ll use UIImagePickerController with the source type set to .camera to provide our app with our target image.

Meet our test subjects: The Beatles

Once we have the image, it’s very straightforward to detect any faces in that image. Before we do, however, it is important to make sure that the imageOrientation property of the UIImage is .up. If it is not .up, then be sure to correct the orientation of the image before you attempt to process it or ML Kit will fail to detect any faces (you can make use of code similar to this before passing it to ML Kit).

The heart of this code is the call to VisionFaceDetector detect(in:) which does the heavy lifting of finding faces in the image. Before we get to that, there are a few other steps:

  1. We instantiate an instance of Firebase Vision service and configure our VisionFaceDetectorOptions.
  2. We instantiate a VisionFaceDetector with our options. Note that this is a property instead of a local variable as the local variable will otherwise be nil by the time we need it.
  3. We create the VisionImage which will be passed to the VisionFaceDetector
  4. The call to VisionFaceDetector detect(in:) does the heavy lifting of scanning the specified image for faces. The completion handler is invoked with an optional array of VisionFace objects as well as an optional error object. Any error is checked and presented to the user.
  5. We iterate through each VisionFace and check to see if each one has smiling probability data associated with it. If it does, we use the CGRect from the VisionFace to create a label which will superimpose the appropriate emoji over the face in the image (because the image is scaled when rendered onscreen, we scale the frame for the emoji so it is placed in the correct location). For simplicity, we’ve encapsulated the logic to convert the smilingProbability to the appropriate emoji in the FaceReplaceViewController.newEmojiLabel(frame:, smilingProbability:) function.
  6. If the face in the image is rotated relative to the Y-axis, this is indicated by the VisionFace.headEulerAngleY, in degrees. We’ll create a CGAffineTransform to rotate the label by the appropriate number of radians and then add the label as a subview of the UIImageView.

What we end up with is a new image with the appropriate emoji placed over each face:

Meet “The Beatles”

As is clear from the code snippet, above face detection and smile detection are both quite straightforward with ML Kit for Firebase. The only difficulty was in determining which smilingProbability values mapped to the various emojis, and those were determined by trial and error on a few test images.

Credit Card Scanner

As was the case with our Face Replace app, our credit card scanner will begin by capturing an image of the front of the credit card using UIImagePickerController.

Our “credit card”

Once we have the credit card image, it’s very straightforward to detect text in that image. Again, it is important to make sure that the imageOrientation property on the UIImage is .up or the results will be gibberish.

The heart of this code is the call to VisionTextDetector detect(in:) which detects text in the image. As with face detection, before we can call this function, a few other steps are necessary:

  1. We instantiate an instance of Firebase Vision service and use it to create a VisionTextDetector. Note that this is assigned to a property so it does not become nil.
  2. The call to VisionTextDetector detect(in:) uses a VisionImage created from the UIImage. and is what actually performs the text detection. The completion handler is invoked with an optional array of VisionText objects as well as an optional error object. Any error is checked and presented to the user. Each VisionText object includes a String with the detected text and a CGRect with the location of the text in the image.
  3. Now that we have all the text in the image, we just have to figure out which text is the credit card number, which is the credit card holder’s name and which is the expiration date. This logic is encapsulated in the CreditCardInfoExtractor.cardholderName(in:, imageSize:), CreditCardInfoExtractor.creditCardNumber(in:, imageSize:) and CreditCardInfoExtractor.creditCardExpirationDate(in:, imageSize:) methods which use regular expressions and the rough location of the detected text in the image to classify the detected text (i.e., credit card numbers are towards the middle of the image while credit card holder names and expiration dates are towards the bottom).

This process was somewhat successful in determining the information we wanted:

Note that the app incorrectly identified “VALID THRU” as the cardholder’s name in this case. The credit card holder’s name was properly detected, but our app selected the wrong two word text string as the name as it had no way to differentiate between the two of them. To improve this, we could consider including some information about the width of the detected text, as credit numbers tend to be quite wide, names a bit narrower and expiration dates even more narrow. This may help somewhat, but it is unlikely to be 100% accurate across all types of credit cards.

But the biggest issue is that ML Kit does not always recognize text accurately. For example, the number 0 in the credit card number and expiration date are sometimes reported as the letters D or O. This fails to match our regular expression, which was coded to detect a series of four digits for the credit card number. We could make the regular expressions look for letters or numbers, but that increases the likelihood of false matches.

Conclusion

Google’s announcement of ML Kit for Firebase at Google I/O 2018, while still in beta, was definitely a boon to mobile developers. It can enable some pretty amazing machine learning-based apps with very little code.

The cross-platform nature of ML Kit for Firebase, unlike, obviously, Apple’s Core ML, is also a step forward for this larger audience of Android and iOS developers as it will enable even more of these developers to take advantage of machine learning in their apps (including those built for both Android and iOS). ML Kit for Firebase will provide a powerful foundation on which developers can explore new ideas and new techniques, all built upon the power of Google’s work in artificial intelligence, and now included in every developer’s arsenal.