Exploring Text Recognition and Face Detection with Google’s ML Kit for Firebase on iOS
Google’s recent introduction of the ML Kit for Firebase beta at Google I/O 2018 brings on-device and in-the-cloud machine learning to their mobile development platform, Firebase. ML Kit for Firebase includes “production-ready” support for several common use cases:
- Recognizing text
- Detecting faces
- Scanning barcodes
- Labeling images
- Recognizing landmarks
Landmark recognition is available exclusively from the cloud API, while text recognition and image labeling are available on device or from the cloud, and face detection and barcode scanning are only available on device. On-device use cases may be used free of charge, while cloud-based use cases will incur a cost when the number of uses is above a threshold.
In this article, we will explore ML Kit for Firebase by building two small proof of concept iOS apps (ML Kit is available for both Android and iOS):
- Face Replace: Uses face detection to find faces in an image and superimpose an emoji which represents how much each person is smiling.
- Credit Card Scanner: Uses text recognition to extract the credit card number, the credit card holder’s name and the expiration date from the front of a credit card.
Face Replace
Our Face Replace app will begin by capturing an image which includes one or more faces. We’ll use UIImagePickerController
with the source type set to .camera
to provide our app with our target image.
Once we have the image, it’s very straightforward to detect any faces in that image. Before we do, however, it is important to make sure that the imageOrientation
property of the UIImage
is .up
. If it is not .up
, then be sure to correct the orientation of the image before you attempt to process it or ML Kit will fail to detect any faces (you can make use of code similar to this before passing it to ML Kit).
The heart of this code is the call to VisionFaceDetector detect(in:)
which does the heavy lifting of finding faces in the image. Before we get to that, there are a few other steps:
- We instantiate an instance of Firebase Vision service and configure our
VisionFaceDetectorOptions
. - We instantiate a
VisionFaceDetector
with our options. Note that this is a property instead of a local variable as the local variable will otherwise benil
by the time we need it. - We create the
VisionImage
which will be passed to theVisionFaceDetector
- The call to
VisionFaceDetector detect(in:)
does the heavy lifting of scanning the specified image for faces. The completion handler is invoked with an optional array ofVisionFace
objects as well as an optional error object. Any error is checked and presented to the user. - We iterate through each
VisionFace
and check to see if each one has smiling probability data associated with it. If it does, we use theCGRect
from theVisionFace
to create a label which will superimpose the appropriate emoji over the face in the image (because the image is scaled when rendered onscreen, we scale the frame for the emoji so it is placed in the correct location). For simplicity, we’ve encapsulated the logic to convert thesmilingProbability
to the appropriate emoji in theFaceReplaceViewController.newEmojiLabel(frame:, smilingProbability:)
function. - If the face in the image is rotated relative to the Y-axis, this is indicated by the
VisionFace.headEulerAngleY
, in degrees. We’ll create aCGAffineTransform
to rotate the label by the appropriate number of radians and then add the label as a subview of theUIImageView
.
What we end up with is a new image with the appropriate emoji placed over each face:
As is clear from the code snippet, above face detection and smile detection are both quite straightforward with ML Kit for Firebase. The only difficulty was in determining which smilingProbability
values mapped to the various emojis, and those were determined by trial and error on a few test images.
Credit Card Scanner
As was the case with our Face Replace app, our credit card scanner will begin by capturing an image of the front of the credit card using UIImagePickerController
.
Once we have the credit card image, it’s very straightforward to detect text in that image. Again, it is important to make sure that the imageOrientation
property on the UIImage
is .up
or the results will be gibberish.
The heart of this code is the call to VisionTextDetector detect(in:)
which detects text in the image. As with face detection, before we can call this function, a few other steps are necessary:
- We instantiate an instance of Firebase Vision service and use it to create a
VisionTextDetector
. Note that this is assigned to a property so it does not becomenil
. - The call to
VisionTextDetector detect(in:)
uses aVisionImage
created from theUIImage
. and is what actually performs the text detection. The completion handler is invoked with an optional array ofVisionText
objects as well as an optional error object. Any error is checked and presented to the user. EachVisionText
object includes aString
with the detected text and aCGRect
with the location of the text in the image. - Now that we have all the text in the image, we just have to figure out which text is the credit card number, which is the credit card holder’s name and which is the expiration date. This logic is encapsulated in the
CreditCardInfoExtractor.cardholderName(in:, imageSize:)
,CreditCardInfoExtractor.creditCardNumber(in:, imageSize:)
andCreditCardInfoExtractor.creditCardExpirationDate(in:, imageSize:)
methods which use regular expressions and the rough location of the detected text in the image to classify the detected text (i.e., credit card numbers are towards the middle of the image while credit card holder names and expiration dates are towards the bottom).
This process was somewhat successful in determining the information we wanted:
Note that the app incorrectly identified “VALID THRU” as the cardholder’s name in this case. The credit card holder’s name was properly detected, but our app selected the wrong two word text string as the name as it had no way to differentiate between the two of them. To improve this, we could consider including some information about the width of the detected text, as credit numbers tend to be quite wide, names a bit narrower and expiration dates even more narrow. This may help somewhat, but it is unlikely to be 100% accurate across all types of credit cards.
But the biggest issue is that ML Kit does not always recognize text accurately. For example, the number 0 in the credit card number and expiration date are sometimes reported as the letters D or O. This fails to match our regular expression, which was coded to detect a series of four digits for the credit card number. We could make the regular expressions look for letters or numbers, but that increases the likelihood of false matches.
Conclusion
Google’s announcement of ML Kit for Firebase at Google I/O 2018, while still in beta, was definitely a boon to mobile developers. It can enable some pretty amazing machine learning-based apps with very little code.
The cross-platform nature of ML Kit for Firebase, unlike, obviously, Apple’s Core ML, is also a step forward for this larger audience of Android and iOS developers as it will enable even more of these developers to take advantage of machine learning in their apps (including those built for both Android and iOS). ML Kit for Firebase will provide a powerful foundation on which developers can explore new ideas and new techniques, all built upon the power of Google’s work in artificial intelligence, and now included in every developer’s arsenal.