Firebase ML Kit 101 : Text Recognition
Text Recognition is the process of detecting and recognising of textual information in images, videos, documents and other sources.
There are many apps like Google Translate, Google Keep, CamScanner etc. which uses the power of text recognition to provide some awesome and useful features.
With ML Kit’s Text Recognition API, you can recognise text in any Latin based language (and more, with cloud-based text recognition).
Firebase ML Kit Series
In this series of articles, we will deep dive into different ML Kit APIs that it offers…
- Firebase ML Kit 101 : Introduction
- Firebase ML Kit 101 : Text Recognition (you’re here)
- Firebase ML Kit 101 : Face Detection
- Firebase ML Kit 101 : Barcode Scanning
- Firebase ML Kit 101 : Image Labeling
- Firebase ML Kit 101 : Landmark Recognition
- Firebase ML Kit 101 : Language Identification
- Firebase ML Kit 101 : Smart Reply
Let’s look into the ML Kit’s Text Recognition API and how we can integrate it into our apps.
ML Kit’s Text Recognition
ML Kit’s Text Recognition provides both on-device and cloud-based APIs.
You can choose which one to use depending on your use case.
The ML Kit’s Text Recogniser segments text into blocks, lines, and elements.
- Block is a contiguous set of text lines, such as a paragraph or column.
- Line is a contiguous set of words on the same vertical axis.
- Element is a contiguous set of alphanumeric characters on the same vertical axis.
Note: Firebase ML Kit is in beta as of January ‘19.
Let’s Code!
Step 1 : Add Firebase to your app
Offcourse! You can add Firebase to your app by following the steps mentioned here.
Step 2 : Include the dependency
You need to include the ML Kit dependency in your app-level build.gradle
file.
dependencies {
// ... implementation 'com.google.firebase:firebase-ml-vision:19.0.2'
}
Step 2.5 : Specify the ML models (optional)
For on-device APIs, you can configure your app to automatically download the ML models after it is installed from the Play Store. Otherwise, the model will be downloaded on the first time you run the on-device detector.
To enable this feature you need to specify your models in your app’s AndroidManifest.xml
file.
<application ...>
...
<meta-data
android:name="com.google.firebase.ml.vision.DEPENDENCIES"
android:value="ocr" />
<!-- To use multiple models: android:value="ocr,model2,model3" -->
</application>
Step 3 : Get! — the Image
ML Kit provides an easy way to recognise text from variety of image types like Bitmap
, media.Image
, ByteBuffer
, byte[]
, or a file on the device. You just need to create a FirebaseVisionImage
object from the above mentioned image types and pass it to the model.
In my sample app I’ve used a Bitmap
image to create a FirebaseVisionImage
object.
val image = FirebaseVisionImage.fromBitmap(bitmap)
To create FirebaseVisionImage
object from other image types, please refer to the official documentation.
Step 4 : Set! — the Model
Now, It’s time to prepare our Text Recognition model.
ML Kit provides both on-device and cloud-based models for Text Recognition.
On Device Model
val textRecognizer = FirebaseVision.getInstance().onDeviceTextRecognizer
Cloud Based Model
val textRecognizer = FirebaseVision.getInstance().cloudTextRecognizer
For the cloud-based Text Recognition model you can also provide different languages that you want your model to detect.
val options = FirebaseVisionCloudTextRecognizerOptions.Builder()
.setLanguageHints(Arrays.asList("en", "hi"))
.build()val textRecognizer = FirebaseVision.getInstance().getCloudTextRecognizer(options)
Step 5 : Gooo!
Finally, we can pass our image to the model for Text Recognition.
textRecognizer.processImage(image)
.addOnSuccessListener {
// Task completed successfully
}
.addOnFailureListener {
// Task failed with an exception
}
Step 6 : Extract the information
Voilà! That’s it!
If the text recognition was successful, you’ll get a FirebaseVisionText
object in the success listener. This FirebaseVisionText
object contains all the textual information present in the image.
As discussed above the ML Kit’s Text Recogniser segments text into blocks, lines, and elements.
Image contains zero or more TextBlock
objects.TextBlock
object contains zero or more Line
objects.Line
object contains zero or more Element
objects.
Image
|
|___ TextBlock
| |
| |___ Line
| |
| |___ Element
|
|___ TextBlock
|
|___ Line
| |
| |___ Element
| |
| |___ Element
|
|___ Line
|
|___ Element
You can extract all this information like this.
Have a Look!
This is what you can achieve with ML Kit’s Text Recognition API.
Here’s the source code for the above app…
Firebase ML Kit Series
Don’t forget to have a look at other ML Kit APIs covered this series of articles.
- Firebase ML Kit 101 : Introduction
- Firebase ML Kit 101 : Text Recognition
- Firebase ML Kit 101 : Face Detection
- Firebase ML Kit 101 : Barcode Scanning
- Firebase ML Kit 101 : Image Labeling
- Firebase ML Kit 101 : Landmark Recognition
- Firebase ML Kit 101 : Language Identification
- Firebase ML Kit 101 : Smart Reply