[Firebase MLKit] TextDetection in Android using Firebase ML Vision APIs with Live Camera

Ajeet Kumar
Aug 31, 2018 · 6 min read

In recent times Google has pushed a lot of common ML related stuff to android making it far easier for developers to utilize them in their apps. One step towards that was introduction of MLKit in GoogleIO.

MLKit provides following options for a developer to use:

  1. Text Recognition (OCR) — For recognizing text in images
  2. Object Detection — Identifying objects, locations, activities, animal species, products, etc.
  3. Face Detection — Identifying faces in images.
  4. Landmark Detection — For detecting famous landmarks.
  5. Barcode Scanning — APIs for barcode and QRCode scanning.

In this article, we will evaluate the TextRecognition (OCR) feature. And how it can be done on a live feed from the camera on device in real-time.

Let’s Start — Creating a google-services.json

All android app that wants to use Firebase features need to include a google-services.json file. For that, goto Firebase console and create a new project. Then click on android button to add an android app and download the google-services.json. Put package name whatever you like but remember that this package name must be the same for our android app too. A quick guide to different steps involved are shown below.

New Android Project

Let’s create a new android project with empty activity and name it according to the package name provided above. Like shown in above, I named it same — textdetectionusingmlkit, domain — ajeetkumar.com.

Copy the google-services.json to the app folder in the android project. Now lets create few empty files that we will be filling later.

  1. CameraSource and CameraSourcePreview — These classes handles image captured by camera and displaying that in the UI. Also, passing the image along for text detection.
  2. FrameMetadata — This contains camera information like width. height, orientation etc.
  3. GraphicOverlay — This creates the text overlay over the camera image.
  4. TextRecognitionProcessor — This is main class involved in text recognition. It has a detector which detects the text in an image.
  5. TextGraphic — Helper class to create text graphic to be added to GraphicOverlay for display.

Since, the length of these files are big, I will be talking more about the portions of the code that contribute to the text recognition. But, I will be providing the link of GitHub repo, at the end of this article, where you can download the code to analyse and improve upon.

build.gradle(Project:<project name>)

Add following lines to dependencies section.

classpath 'com.google.gms:google-services:4.0.1'

build.gradle (Module:app)

Add below libraries to the dependencies section.

// ML Kit dependencies
implementation 'com.google.firebase:firebase-core:16.0.3'
implementation 'com.google.firebase:firebase-ml-vision:17.0.0'

Then add below line to the bottom of the file.

apply plugin: 'com.google.gms.google-services'

activity_main.xml

We have added a CameraSourcePreview element with a GraphicOverlay inside it. CameraSourcePreview shows the output from camera sensor. And GraphicOverlay is used to display the captured texts.

<?xml version="1.0" encoding="utf-8"?>
<android.support.constraint.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".MainActivity"
>

<com.ajeetkumar.textdetectionusingmlkit.camera.CameraSourcePreview
android:id="@+id/camera_source_preview"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:layout_marginBottom="8dp"
android:layout_marginEnd="8dp"
android:layout_marginStart="8dp"
android:layout_marginTop="8dp"
app:layout_constraintBottom_toBottomOf="parent"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toTopOf="parent"
>

<com.ajeetkumar.textdetectionusingmlkit.others.GraphicOverlay
android:id="@+id/graphics_overlay"
android:layout_width="match_parent"
android:layout_height="match_parent"
/>

</com.ajeetkumar.textdetectionusingmlkit.camera.CameraSourcePreview>


</android.support.constraint.ConstraintLayout>

MainActivity.java

We are instantiating UI elements in the onCreate() and creating and starting the camera source through methods mentioned below. It will then start the loop of camera capture => text recognition => display on graphic overlay.

private void createCameraSource() {

if (cameraSource == null) {
cameraSource = new CameraSource(this, graphicOverlay);
cameraSource.setFacing(CameraSource.CAMERA_FACING_BACK);
}

cameraSource.setMachineLearningFrameProcessor(new TextRecognitionProcessor());
}

private void startCameraSource() {
if (cameraSource != null) {
try {
if (preview == null) {
Log.d(TAG, "resume: Preview is null");
}
if (graphicOverlay == null) {
Log.d(TAG, "resume: graphOverlay is null");
}
preview.start(cameraSource, graphicOverlay);
} catch (IOException e) {
Log.e(TAG, "Unable to start camera source.", e);
cameraSource.release();
cameraSource = null;
}
}
}

CameraSourcePreview.java

This class handles the CameraSource class. It starts the camera sensor, manages the GraphicOverlay and displays the output from the camera sensor on to the screen. Most important method in this class is:

@SuppressLint("MissingPermission")
private void startIfReady() throws IOException {
if (startRequested && surfaceAvailable) {
cameraSource.start(surfaceView.getHolder());
if (overlay != null) {
Size size = cameraSource.getPreviewSize();
int min = Math.min(size.getWidth(), size.getHeight());
int max = Math.max(size.getWidth(), size.getHeight());
if (isPortraitMode()) {
// Swap width and height sizes when in portrait, since it will be rotated by
// 90 degrees
overlay.setCameraInfo(min, max, cameraSource.getCameraFacing());
} else {
overlay.setCameraInfo(max, min, cameraSource.getCameraFacing());
}
overlay.clear();
}
startRequested = false;
}
}

CameraSource.java

Primary role of this class is to get the the image preview from the camera sensor and pass it along to the TextRecognitionProcessor class for text recognition.

It does this flow with the help of a Thread and Runnable object — FrameProcessingRunnable. The thread keeps waiting till any previewImage is available. Once the previewImage is available, it then sends it to TextRecognitionProcessor for processing.

try {
synchronized (processorLock) {
Log.d(TAG, "Process an image");
frameProcessor.process(
data,
new FrameMetadata.Builder()
.setWidth(previewSize.getWidth())
.setHeight(previewSize.getHeight())
.setRotation(rotation)
.setCameraFacing(facing)
.build(),
graphicOverlay);
}
} catch (Throwable t) {
Log.e(TAG, "Exception thrown from receiver.", t);
} finally {
camera.addCallbackBuffer(data.array());
}

TextRecognitionProcessor.java

Now, we have arrived at the brain of this project. As soon as it receives the image data in format of ByteBuffer, it immediately converts it to a FirebaseVisionImage object and move it for further processing.

public void process(ByteBuffer data, FrameMetadata frameMetadata, GraphicOverlay graphicOverlay) throws FirebaseMLException {

if (shouldThrottle.get()) {
return;
}
FirebaseVisionImageMetadata metadata =
new FirebaseVisionImageMetadata.Builder()
.setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
.setWidth(frameMetadata.getWidth())
.setHeight(frameMetadata.getHeight())
.setRotation(frameMetadata.getRotation())
.build();

detectInVisionImage(FirebaseVisionImage.fromByteBuffer(data, metadata), frameMetadata, graphicOverlay);
}

After we have converted the image data to a FirebaseVisionImage, we let a detector — FirebaseVisionTextRecognizer, process that. The detector then analyzes the image for any text and then return a List<FirebaseVisionText> accordingly.

If the returned array mentioned above has some values then it create one TextGraphic object for each of them and add them to the GraphicsOverlay which has been passed to it.

protected void onSuccess( @NonNull FirebaseVisionText results, @NonNull FrameMetadata frameMetadata, @NonNull GraphicOverlay graphicOverlay) {

graphicOverlay.clear();

List<FirebaseVisionText.TextBlock> blocks = results.getTextBlocks();

for (int i = 0; i < blocks.size(); i++) {
List<FirebaseVisionText.Line> lines = blocks.get(i).getLines();
for (int j = 0; j < lines.size(); j++) {
List<FirebaseVisionText.Element> elements = lines.get(j).getElements();
for (int k = 0; k < elements.size(); k++) {
GraphicOverlay.Graphic textGraphic = new TextGraphic(graphicOverlay, elements.get(k));
graphicOverlay.add(textGraphic);

}
}
}
}

Final Output

Realtime text detection using MLKit

After text detection there can be many possibilities of its usage like if we add translation feature then it can be used to read board signs in different languages, etc.



If you like this article make sure to give it a 👏. And if you would like to support me, please consider buying me a coffee :)

Digital Curry

Programming recipes covering android, kotlin, tensorflow, nodejs, vision, deep learning etc.

Ajeet Kumar

Written by

Digital Curry

Programming recipes covering android, kotlin, tensorflow, nodejs, vision, deep learning etc.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade