How to extract text on images using Firebase Machine Learning Kit in Android-Kotlin: Part 1

7 min readFeb 25, 2020

Machine Learning is one of the most commonly talked topics across the world. It’s a subset of Artificial Intelligence and plays a key role in creating the cognitive nature of AI. In this tutorial, I will take you through simple steps on how you would implement a machine learning model in Android devices using Kotlin. The model will be based on Firebase ML Kit and Google Cloud platform, which will save us lots of time creating a pre-trained model. Here is a snapshot of what we will be doing:

Requirements:

Android Studio (Fully Setup)
Basic Knowledge of Kotlin
Basic Interaction with Firebase
Google Cloud account (Billed, but has $300 free credit)

Step 1: Setup Firebase & Google Cloud:

1. Visit https://firebase.google.com/ and create an account (if you don’t have one.) Then go to get started → Add project (Give it a desired name, I named mine my-project)

create a new project then give it any desired name

2. Access the console of your Firebase account. Locate and navigate to ML-KIT → Get started. From the following interface, you will see a list of available ML APIs available from Google, like Text Detection, Face recognition e.t.c. We are only interested in Face Detection.

3. To enable text detection, we need to enable the vision API from the Google Cloud platform console found at https://console.cloud.google.com/ (Make sure to click on console).

On the interface that appears, click on Search, then type ‘Cloud Vision API’ as shown.

Once it's loaded, click on ‘Enable API’.

Step 2: Android Studio Setup.

In order for us to use machine learning in Android Studio, we need a couple of libraries to help us to achieve our first ML Project.

1. Create a new project: So open up your Android Studio, Create a new project, Can be a blank activity or any other that you decide on. Make sure to select API version 21 or higher for ML support. Make sure to tick Kotlin as the language as well, as we will be doing this tutorial in Kotlin.

Select Kotlin as Language and API 21 or Higher

2. Add permissions: Once that’s done, open up your manifest folder from the project tree structure and add these permissions and a meta tag.

//These ones will come before the application tags
<uses-permission android:name="android.permission.INTERNET"/>
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE"/>
//add the meta tags before initiating your activity<meta-data
    android:name="com.google.firebase.ml.vision.DEPENDENCIES"
    android:value="text,face,label" />

This meta tag will help us to download the models when the user first installs the application, which will make our work easier. The permissions will allow our app to access the internet via the user’s phone.

3. Connect app to Firebase: To utilize Firebase features in our app, we will need to connect our app to your already created project on firebase by navigating to :

Tools → Firebase →(Side Menu appears) → Go to Analytics → Log an analysis event → Connect to Firebase(Wait for it to sync) →Add analytics to your app (Let it sync as well)

We will also need Picasso library (to load our images), Anko library (to help us to structure our Kotlin Syntax with simplicity). So navigate to build.gradle (module app) and add the following lines. These lines will import all the above-mentioned libraries.

implementation 'com.google.firebase:firebase-ml-vision:16.0.0'
implementation 'com.squareup.picasso:picasso:2.5.2'
implementation 'org.jetbrains.anko:anko-commons:0.10.5'

Make sure your android studio syncs correctly after adding the above.

Your dependancies should look like this — How your gradle files should look like.

4. Build the views: In Android, we will need a Button, and ImageView and EditText on which we will be interacting with our app. The EditText will play a key role in getting us the images and the button will help us detect the text once the image is loaded. Open your activity_main.xml and edit with the following code. (Customize as per your wish)

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">
    <EditText
        android:layout_width="match_parent"
        android:imeOptions="actionDone"
        android:id="@+id/etUrl"
        android:inputType="textUri"
        android:hint="Enter url"
        android:layout_marginLeft="10dp"
        android:layout_marginRight="10dp"
        android:layout_alignParentTop="true"
        android:layout_marginTop="20dp"
        android:layout_height="wrap_content"/>
    <ImageView
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:layout_margin="10dp"
        android:scaleType="centerCrop"
        android:layout_below="@+id/etUrl"
        android:id="@+id/image_holde"
        />

    <Button
        android:layout_width="match_parent"
        android:id="@+id/btnDetect"
        android:text="DETECT TEXT"
        android:textColor="#ffffff"
        android:layout_margin="20dp"
        android:backgroundTint="@color/colorAccent"
        android:background="@drawable/button_background"
        android:layout_alignParentBottom="true"
        android:layout_height="wrap_content"/>

</RelativeLayout>

For the above, to make the button little rounded, I created a drawable resource file button_background and played around with the shape and edges as shown:

<?xml version="1.0" encoding="utf-8"?>
<shape xmlns:android="http://schemas.android.com/apk/res/android">

    <corners android:radius="16dp" android:topLeftRadius="25dp" android:bottomRightRadius="25dp"/>
</shape>

5. Wire up the code to get ML in the app: We start it off with creating an instance of the FirebaseVision with the functionality to take text from an image as shows, above the onCreate() function:

val detector = FirebaseVision.getInstance().visionTextDetector

Next, we have to allow the user enter an image URL into the input field, and we will allow the user to press done button of the virtual keyboard, and listen to every time this user presses it, once this user presses done, our app will load the desired image from the URL entered and display it using Picasso Library we earlier on added. We use the id of the EditText (etUrl) and add event listeners to it as below:

etUrl.setOnEditorActionListener { _, action, _ ->    if (action==EditorInfo.IME_ACTION_DONE) {
     
     Picasso.with(ctx)
            .load(etUrl.text.toString())
            .into(image_holder)

Once the image is loaded into our ImageView, we then ask our detector, earlier on declared to try to detect any readable text from the image, and by this, we will loop through this image to get as much data as possible as shown below. Remember, we have to convert this image from Picasso to a bitmap so that they can communicate correctly with Firebase Image conversion process as shown.

btnDetect.setOnClickListener {  val textImage = FirebaseVisionImage
                      .fromBitmap((image_holde.drawable as BitmapDrawable).bitmap)

Once we get the required bitmap, for this case textImage, we then tell the detector to detect the text in the image and request for a listener which will respond by giving us either a success or a failure depending on the prevailing conditions. We will then loop through the response we receive and for every moment we get the data, we assign it to a variable: detectedText.

detector.detectInImage(textImage).addOnCompleteListener {                    var detectedText = ""
                    it!!.result!!.blocks!!.forEach {
                        detectedText += it.text + "\n"                        ctx!!.runOnUiThread {
                            alert(detectedText, "DETECTED TEXT").show()
                        }
                    }
                }.addOnFailureListener {
                    ctx!!.runOnUiThread {                        alert("Unable to detect Text", "ERROR DURING DETECTION").show()                    }
                }
            }            detector.close()
            true
        }
        false }
}

Of course, we have used Anko library to create the alerts to show our detected text, which runs on the UI thread.

The full code for our MainActivity.kt will be as shown below:

package bensalcie.likesyou.org.textdetectorimport android.graphics.drawable.BitmapDrawable
import androidx.appcompat.app.AppCompatActivity
import android.os.Bundle
import android.view.inputmethod.EditorInfo
import com.google.firebase.ml.vision.FirebaseVision
import com.google.firebase.ml.vision.common.FirebaseVisionImage
import com.squareup.picasso.Picasso
import kotlinx.android.synthetic.main.activity_main.*
import org.jetbrains.anko.alert
import org.jetbrains.anko.ctx
import org.jetbrains.anko.runOnUiThreadclass MainActivity : AppCompatActivity() {
    val detector=FirebaseVision.getInstance().visionTextDetector    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        etUrl.setOnEditorActionListener { _, action, _ ->            if (action==EditorInfo.IME_ACTION_DONE)
            {
                Picasso.with(ctx).load(etUrl.text.toString()).into(image_holde)                btnDetect.setOnClickListener {                    val textImage=FirebaseVisionImage.fromBitmap((image_holde.drawable as BitmapDrawable).bitmap)                    detector.detectInImage(textImage).addOnCompleteListener {                        var detectedText = ""
                        it!!.result!!.blocks!!.forEach {
                            detectedText += it.text + "\n"                            ctx!!.runOnUiThread {
                                alert(detectedText, "DETECTED TEXT").show()
                            }
                        }
                    }.addOnFailureListener {
                        ctx!!.runOnUiThread {                            alert("Unable to detect Text", "ERROR DURING DETECTION").show()                        }
                    }
                }                detector.close()
                true
            }
            false }
    }
}

Having gone through all this, when you run your application, you should be able to detect any text from an image from any URL.

Remember, when using the Firebase ML Kit, detection is done either on the device or on Google Cloud Platform, for our case, we have been using on-device detection, which is convenient for faster loading data models. You could also train your own model and upload it on the Firebase ML kit and use it just as we have used the text detection one.

Below is an illustration of the app we have just made: