Firebase ML- Kit, a quick overview

Kartik Nema
Analytics Vidhya
Published in
11 min readDec 8, 2019

So after learning how to store, retrieve and manipulate data or media on Firebase, you might wonder what else can you do? One obvious thing is that you can control authentication with ease using Firebase. After around 4 months of learning this sort of stuff, and designing quite a few apps on that design, I started to explore some other Firebase features, and the most evident one was ML- kit.

ML Kit lets you bring powerful machine learning features to your app whether it’s for Android or iOS, and whether you’re an experienced machine learning developer or you’re just getting started.

ML Kit provides many readily defined ML Models(which we will discuss in this article), in addition you can define your own models, which Firebase can host. ML Kit provides both on-device and Cloud APIs, all in a common and simple interface for common mobile use cases: recognizing text, detecting faces, scanning barcodes, labeling images and recognizing landmarks. All these functionalities can be achieved in just a few lines of code.

Now, we will look into a basic overview of all these functionalities and how you can quickly implement them into your android apps. But first of all you need to add Firebase to your android app, if you don’t know how to do that, then refer this. Also, do provide the SHA1, while registering your app. You can do this quickly by generating a signing report in your android studio or just follow these instructions.

1 . Text Recognition -

Text Recognition is used to extract the text from an image. In other words it can be used to detect text in any Latin based Language .The image can be a bitmap, byteBuffer or a byte[]. In this article I will stick with a bitmap image for all references.

Begin by adding the following dependencies in app level build.gradle

implementation 'com.google.firebase:firebase-ml-vision:24.0.1'

First we will discuss a few basic terminologies which will be useful while implementing other functionalities.

So first we start with a Bitmap of the image. We then initialise a FirebaseVisionImage object, passing the bitmap of the image in the constructor. FirebaseVisionImage represents an image object that can be used for both on-device and cloud API detectors. Now we process the image using FirebaseVisionTextRecognizer which is an on-device or cloud text recogniser that recognises the text in an image. We call process Image on the FirebaseVisionTextRecognizer, which takes the firebaseVisionImage as a parameter. We then use an onSuccess listener to check when the detection is complete, this is because processImage is asynchronous . If successful we can access a FirebaseVisionText object, which contains the text found in an image. Here is the necessary code snippet.

val bitmap = MediaStore.Images.Media.getBitmap(this.contentResolver, contentURI)
val
firebaseVisionImage = FirebaseVisionImage.fromBitmap(bitmap)
val textRecognizer = FirebaseVision.getInstance().onDeviceTextRecognizer
textRecognizer.processImage(firebaseVisionImage)
.addOnSuccessListener{
//If successful an object of FirebaseVisionText will be //returned which contains all the text found in the image.

}

.addOnFailureListener{ }

After this is done, we can process the FirebaseVisonText object to extract the text present in it. FirebaseVisionText has 2 properties

  1. getText() : Returns a single string of all text recognised in an image.
  2. getTextBlocks(): Returns a list of type FirebaseVisionText.TextBlock, which is a text block recognised in an image, i.e. a group or paragraph of text. We can use a getText() method in a text Block to get all the text in that block. We also have a method called getLines() which returns a list of object type FirebaseVisionText.Line, which is a line of text in the text block. In a similar pattern as block, FirebaseVisionText.Line contains a getText() method which return all the text in a line and getElements() which basically return a list of all the words in a line.

Here is a code snippet taken from the codelab linked in the resourses, which explains extracting text from the text block object.

private void processTextRecognitionResult(FirebaseVisionText texts) {
List<FirebaseVisionText.TextBlock> blocks = texts.getTextBlocks();
if (blocks.size() == 0) {
Toast.makeText(this,"No text", Toast.LENGTH_SHORT).show()
return;
}
for (int i = 0; i < blocks.size(); i++) {
List<FirebaseVisionText.Line> lines = blocks.get(i).getLines();
for (int j = 0; j < lines.size(); j++) {
List<FirebaseVisionText.Element> elements = lines.get(j).getElements();
for (int k = 0; k < elements.size(); k++) {
//We can now print the words or highlight it.
}
}
}
}

So that’s it you have successfully achieved text recognition in your android app. Here is a list of resources which you must check out

(i) Firecast by Jen Person

(ii) Here is a github gist from a simple android app I created to achieve the above functionality. Also I am providing the link to the above mentioned app which you can import and run to gain a better understanding, the app is very simple and contains no high end UI, it has just enough code to achieve the core functionalities

(iii) Codelab

2. Face Detection

With ML Kit’s face detection API, you can detect faces in an image, identify key facial features, and get the contours of detected faces. Because ML Kit can perform face detection in real time, you can use it in applications like video chat or games that respond to the player’s expressions.

To begin, we first need to have an image (bitmap format in our case).. To get started add the following dependencies in the app level build.gradle

implementation 'com.google.firebase:firebase-ml-vision:24.0.1'
implementation 'com.google.firebase:firebase-ml-vision-face-model:19.0.0'

First create the FirebaseVisionImage object as discussed above, and pass the bitmap of the image in the constructor .To process the image we create an object of FirebaseVisionFaceDetector. Then we call the detectInMethod on the FirebaseVisionFaceDetector object, passing the image bitmap as a parameter. We then use an onSuccess listener to check when the detection is complete, this is because detectInImage is asynchronous . If the face recognition operation succeeds, a list of FirebaseVisionFace objects will be passed to the success listener. Each FirebaseVisionFace object represents a face that was detected in the image. For each face, you can get its bounding coordinates in the input image, as well as any other information you configured the face detector to find. Here are a few code snippets.

val bitmap = MediaStore.Images.Media.getBitmap(this.contentResolver, contentURI)
val
firebaseVisionImage = FirebaseVisionImage.fromBitmap(bitmap)
val faceDetector = FirebaseVision.getInstance().visionFaceDetector
faceDetector.detectInImage(firebaseVisionImage)
.addOnSuccessListener{
//do something

}
.addOnFailureListener{

}

Each FirebaseVisionFace object provides the co-ordinates of several components like leftEye, rightEye, noseBase, left and right ear etc. These components are called Landmarks. Also in addition this object also returns the probability or confidence of some events like the person smiling, the left eye being open and the right eye being open. You can access any landmark as for example :

for(face in faces!!){
val rightEye = face.getLandmark(FirebaseVisionFaceLandmark.RIGHT_EYE)!!
val smilingProbability = face.smilingProbability
val left_eye_open_probability = face.leftEyeOpenProbability
val right_eye_open_probability = face.rightEyeOpenProbability
}

Before you apply face detection to an image, if you want to change any of the face detector’s default settings, specify those settings with a FirebaseVisionFaceDetectorOptions object. For example,

// High-accuracy landmark detection and face classification
val highAccuracyOpts = FirebaseVisionFaceDetectorOptions.Builder()
.setPerformanceMode(FirebaseVisionFaceDetectorOptions.ACCURATE)
.setLandmarkMode(FirebaseVisionFaceDetectorOptions.ALL_LANDMARKS)
.setClassificationMode(FirebaseVisionFaceDetectorOptions.ALL_CLASSIFICATIONS)
.build()

// Real-time contour detection of multiple faces
val realTimeOpts = FirebaseVisionFaceDetectorOptions.Builder()
.setContourMode(FirebaseVisionFaceDetectorOptions.ALL_CONTOURS)
.build()

In this case when we define our custom options, then we can initialise the FirebaseVisionFaceDetector as follows

val faceDetector = FirebaseVision.getInstance().getVisionFaceDetector(highAccuracyOpts)

That’s it, with this much code you can achieve the core functionality.

Here is a sample screenshot from the app, since the FirebaseVisionFaceDetector, provides a list of FirebaseVisionFaces hence we can also use photos containing multiple people in them.

Resources:

(i) Official Documentation

(ii) Github Gist, link to the project

3. Scanning Barcodes

With ML Kit’s barcode scanning API, you can read data encoded using most standard barcode formats. Barcode scanning happens on the device, and doesn’t require a network connection.

Barcodes are a convenient way to pass information from the real world to your app. In particular, when using 2D formats such as QR code, you can encode structured data such as contact information or WiFi network credentials. Because ML Kit can automatically recognize and parse this data, your app can respond intelligently when a user scans a barcode.

Barcode Scanning is relatively simple, first of all add the required dependencies

implementation 'com.google.firebase:firebase-ml-vision:24.0.1'
implementation 'com.google.firebase:firebase-ml-vision-barcode-model:16.0.1'

Now as usual create an object of FirebaseVisionImage and pass the bitmap of the image in the constructor. Now to process the barcode in the image, we first create an object of FirebaseVisionBarcodeDetector, then we can the detectInImage() on this object and pass the object of FirebaseVisionImage as a parameter. We then use an onSuccess listener to check when the detection is complete, this is because detectInImage is asynchronous. Here is the required code snippet.

val bitmap = MediaStore.Images.Media.getBitmap(this.contentResolver, contentURI)
val
firebaseVisionImage = FirebaseVisionImage.fromBitmap(bitmap)
val barcodeDetector = FirebaseVision.getInstance().visionBarcodeDetector
barcodeDetector.detectInImage(firebaseVisionImage)
.addOnSuccessListener{
//Do something

}
.addOnFailureListener{

}

If the barcode recognition operation succeeds, a list of FirebaseVisionBarcode objects will be passed to the success listener. Each FirebaseVisionBarcode object represents a barcode that was detected in the image. For each barcode, you can get its bounding coordinates in the input image, as well as the raw data encoded by the barcode. Also, if the barcode detector was able to determine the type of data encoded by the barcode, you can get an object containing parsed data.

We can process the data as follows

for(barcode in barcodes!!){
canvas.drawRect(barcode.boundingBox!!,rectPaint)
val valueType = barcode.valueType
when(valueType){
FirebaseVisionBarcode.TYPE_WIFI -> {
val ssid = barcode.wifi!!.ssid
val password = barcode.wifi!!.password
val type = barcode.wifi!!.encryptionType
}
FirebaseVisionBarcode.TYPE_URL -> {
val title = barcode.url!!.title
val url = barcode.url!!.url
}
}
}
}

We can also obtain the entire information contained in the barcode in the raw format as follows,

val barcode_raw = barcode.rawValue

Resources:

(i) Official Documentation

(ii) Github Gist, link to the project

4. Labelling Images

Image labeling gives you insight into the content of images. When you use the API, you get a list of the entities that were recognized: people, things, places, activities, and so on. Each label found comes with a score that indicates the confidence the ML model has in its relevance.

ML- Kit provides both onDevice and cloude based APIs for Image Labelling. The onDevice API requires no internet and is quite fast. The cloud based APIs on the other hand requires internet and is much more accurate as it contains way more labels than the onDevice API.

Let’s discuss the onDevice API, first of all add the following dependencies

implementation 'com.google.firebase:firebase-ml-vision:24.0.1'
implementation 'com.google.firebase:firebase-ml-vision-image-label-model:19.0.0'

Continuing with the long tradition you first need to get the bitmap of the image and pass it into the constructor of the FirebaseVisionImage object. Now to label the image, we first create an object of FirebaseVisionImageLabeler, then we can the processImage() on this object and pass the object of FirebaseVisionImage as a parameter. We then use an onSuccess listener to check when the detection is complete, this is because processImage() is asynchronous. Here is the required code snippet.

val bitmap = MediaStore.Images.Media.getBitmap(this.contentResolver, contentURI)
val
firebaseVisionImage = FirebaseVisionImage.fromBitmap(bitmap)
val options = FirebaseVisionOnDeviceImageLabelerOptions.Builder()
.setConfidenceThreshold(0.7F)
.build()
val labelDetector = FirebaseVision.getInstance().getOnDeviceImageLabeler(options)
labelDetector.processImage(firebaseVisionImage)
.addOnSuccessListener{
//Do something

}
.addOnFailureListener{
}

Here, as you can observe we have customised some settings with a FirebaseVisionOnDeviceImageLabelerOptionsobject. For example —

 val options = FirebaseVisionOnDeviceImageLabelerOptions.Builder()
.setConfidenceThreshold(0.7f)
.build()

If the labelling was successful a list of FirebaseVisionImageLabel will be returned, each object in this list contains 2 components name of the label and the confidence value (float). Here is a snippet showing the extraction of data:

private fun labelImage(labels: List<FirebaseVisionImageLabel>?, image: Bitmap?){
for(label in labels!!){
val name = label.text
val confidence =
label.confidence
}
}

Now you can store all the labels in an ArrayList and display it using a ListView and adapter, as I did in the sample app.

Resources:

(i) Official Documentation

(ii) Github Gist, link to the project

5. Landmark Recognition

With ML Kit’s landmark recognition API, you can recognize well-known landmarks in an image. When you pass an image to this API, you get the landmarks that were recognized in it, along with each landmark’s geographic coordinates and the region of the image the landmark was found.

Before we begin, You must know that this API is cloud based and the onDevice version is not available, so this feature is most probably not available in the free plan i.e. the Spark Plan in Firebase, it is available in the paid plans like the Flame Plan or Blaze Plan. Refer here for firebase pricing. So even if you do all the right things but still don’t see any output then don’t be disheartened, it’s probably due to this reason. All other functionalities are included in the free Spark Plan.

Begin by adding the dependency

implementation 'com.google.firebase:firebase-ml-vision:24.0.1'

Now you know how it goes,

Create an object of FirebaseVisionImage and pass the bitmap of the image in the constructor. Now to process the image, we first create an object of FirebaseVisionCloudLandmarkDetector, then we can call the detectInImage() method on this object and pass the object of FirebaseVisionImage as a parameter. We then use an onSuccess listener to check when the detection is complete, this is because detectInImage is asynchronous. Here is the required code snippet -

val bitmap = MediaStore.Images.Media.getBitmap(this.contentResolver, contentURI)
val
firebaseVisionImage = FirebaseVisionImage.fromBitmap(bitmap)
val landmarkDetector = FirebaseVision.getInstance().visionCloudLandmarkDetector
landmarkDetector.detectInImage(firebaseVisionImage)
.addOnSuccessListener{
//Do something
}
.addOnFailureListener {

}

If the landmark recognition operation succeeds, a list of FirebaseVisionCloudLandmark objects will be passed to the success listener. Each FirebaseVisionCloudLandmark object represents a landmark that was detected in the image. For each landmark, you can get its name, Geographic Coordinates, Bounding Polygon and the associated Confidence Score. Here is an example of extraction of landmark data

for (landmark in firebaseVisionCloudLandmarks) {

val bounds = landmark.boundingBox
val landmarkName = landmark.landmark
val entityId = landmark.entityId
val confidence = landmark.confidence

for (loc in landmark.locations) {
val latitude = loc.latitude
val longitude = loc.longitude
}
}

I don’t have any screenshots to attach for this, as like many of you I use the basic free plan, but as you can see it’s implementation is nonetheless basically the same.

Resources

(i) Official Documentation

And well we are done for now, Phew! that was long but I hope it wasn’t difficult to follow. So, we basically had a discussion about all the functionalities which can be achieved by the default ML models provided by Firebase, as you will learn or continue to learn about Neural Networks and other concepts, you will acquire the skill of designing your own models and Firebase has the capability to host them. One more important point before I take a leave is that in this article we have always used images available in the gallery (or File Storage) of the device, however we can also configure these features to work with live pictures taken by the mobile camera, this is because all the firebase detectors we discussed here work in real time.

Thanks for reading

Hope you liked this piece.

About Me: I am Kartik Nema a Sophomore Android developer from IIIT Allahabad

--

--