Recognizing Text with Firebase ML Kit on iOS & Android

A practical guide to implementing the text recognition feature with Firebase ML Kit.

In my previous article, I’ve talked about what Firebase ML Kit is and we’ve done a brief walkthrough of all the features in it.

Here’s the link to the article again, in case you missed it:

In this article, I’ll go over how to implement the text recognition feature in ML Kit for your iOS and Android apps.

Before we begin, make sure that:

  • you’ve included Firebase in your project. You can find out how to do that in iOS here and in Android here, and
  • you’ve enabled cloud-based APIs if you plan on using them.

As mentioned in the previous article, you’ll have to upgrade to the Blaze plan to use the cloud-based APIs. Once you upgrade, you will find the option to enable cloud-based APIs on the ML Kit page of your project’s Firebase dashboard.

On-device APIs for Text Recognition & Barcode scanning; Cloud-based APIs for Image labeling

iOS

Step 1: Include the pods

For iOS, you need to include one of two pods, MLVision or MLVisionTextModel.

Use MLVision if you plan on using the cloud API, and MLVisionTextModel if you only want to use the on-device API.

Include them in your Podfile like so:

pod 'Firebase/Core'
# On-device API
pod 'Firebase/MLVisionTextModel'
# Cloud-based API
pod 'Firebase/MLVision'

Once you’ve included these pods in your Podfile, run the pod install a command to install these pods.

Step 2: Import Firebase

In your app, import Firebase wherever you need to use it like this:

import Firebase

Step 3: Get an instance of Vision

No, not this Vision…

We’re talking about Firebase’s Vision here:

let vision = Vision.vision()

Step 4: Get an instance of the text recognizer

Once you get an instance of Vision, you need to get your text recognizer, and how you do this depends on which API you use.

// On-device API
let recognizer = vision.onDeviceTextRecognizer()
// Cloud-based API
let recognizer = vision.cloudTextRecognizer()

This is the component that is responsible for processing your image and recognizing text in it.

Step 5 (Optional): Configure your text recognizer to detect certain languages

You can also configure your text recognizer to only recognize text that’s in a particular language or a particular set of languages.

let options = VisionCloudDocumentTextRecognizerOptions()
// Setting languages to English, French & Hindi
options.languageHints = ["en", "fr", "hi"]
// Create your text recognizer with the above options
let recognizer = vision.cloudDocumentTextRecognizer(options: options)

Step 6: Get your image as a VisionImage

PassUIImage as a parameter to VisionImage like so:

let visionImage = VisionImage(image: UIImage)

There are alternative ways to get VisionImage here.

Step 7: Process your VisionImage with your text recognizer

Now that you have both your text recognizer and your image, you can now process your image by passing it to the process(_:completion:) method and get the results.

This method returns a VisionText object.

This was a tutorial on how to detect text from images in general. There’s an option in ML Kit where you can detect text from an image which is a picture of a document.

Learn more about it here.

Android

Step 1: Add Firebase ML Vision as a dependency

For Android, you need to include the ML Vision dependency in your app-level build.gradle file’s dependencies block like so:

implementation 'com.google.firebase:firebase-ml-vision:19.0.3'

Step 2: Auto-download the text recognition ML model

Follow this step only if you’re using the on-device API.

Adding the following block to your AndroidManifest.xml file ensures that the text recognition ML model gets downloaded automatically when your app is downloaded from the Play Store:

<application ...>
...
<meta-data
android:name="com.google.firebase.ml.vision.DEPENDENCIES"
android:value="ocr" />
</application>

Step 3: Get an instance of the text recognizer

Since your text recognizer is the component that’s responsible for processing your image and recognizing text in it, you need to get an instance of this before you do anything else:

// On-device API
val recognizer = FirebaseVision.getInstance().onDeviceTextRecognizer
// Cloud-based API
val recognizer = FirebaseVision.getInstance().cloudTextRecognizer

Step 4 (Optional): Configure your text recognizer to detect certain languages

You can configure your text recognizer only to recognize text that’s in a particular language or a particular set of languages.

// Setting languages to English, French & Hindi
val options = FirebaseVisionCloudTextRecognizerOptions.Builder()
.setLanguageHints(Arrays.asList("en", "fr", "hi"))
.build()
// Create your text recognizer with the above options
val recognizer = FirebaseVision.getInstance().getCloudTextRecognizer(options)

Step 5: Get your image as a FirebaseVisionImage

Here’s how you can get your FirebaseVisionImage from a Bitmap:

val image = FirebaseVisionImage.fromBitmap(bitmap)

Step 6: Process your FirebaseVisionImage with your text recognizer

After you get your FirebaseVisionImage and your text recognizer, you can process your image by calling the processImage() method on the text recognizer like so:

Again, this was a tutorial on how to detect text from images in general. There’s an option in ML Kit where you can detect text from an image which is a picture of a document.

You can learn more about it here.

Conclusion

That’s it for the text recognition feature for Firebase ML Kit!

Here are all the articles in my ML Kit series:


If you liked this article, click on the 👏 button (did you know you could go up to 50?), follow me here on Medium and share this article on your socials!
Find me and all my socials here: