Android’s CameraX and ML Kit

Paul Trebilcox-Ruiz
8 min readMay 29, 2019

--

Announced at Google I/O this summer (2019), CameraX aims to alleviate some of the pain of using a camera in an Android app by wrapping common use cases with a simple to use API. In this tutorial we’ll go over how you would display a preview from the camera, and then analyze the camera input in real time to display the most likely object seen through the use of the Firebase ML Kit labeling API. It’s worth noting that while we’re only using the labeling API here, ML Kit does an amazing job of using very similar code across its various functionalities, so you should be able to analyze bar codes, detect faces, or recognize famous landmarks with a few changed lines of code.

Setup

For this tutorial, start by creating a new Android application with an empty Activity.

Once your initial project has built, go to the Firebase Console and create a new project.

From the Project Overview screen, you’ll notice a section at the top that says Get started by adding Firebase to your app. Click on the Android face icon to add your Android app to the Firebase console.

On the next page you will want to add your package name into the form. You can skip the nickname and SHA-1 key, as we won’t be using any features that require these.

The next few steps on this page will guide you through downloading your google-services.json file and placing it into your app, then initializing the Firebase SDK in your app.

After you’ve finished the Firebase setup process and verified that your app is able to connect to Firebase, open your module level build.gradle file. Under the dependencies node, add the following two lines to include local Firebase labeling in your app

implementation 'com.google.firebase:firebase-ml-vision:20.0.0'
implementation 'com.google.firebase:firebase-ml-vision-image-label-model:17.0.2'

Depending on the template used for creating your app, you may need to remove the old constraint-layout and appcompat dependencies, which will look like

implementation 'com.android.support:appcompat-v7:28.0.0'
implementation 'com.android.support.constraint:constraint-layout:1.1.3'

and replace them with

implementation 'androidx.appcompat:appcompat:+'
implementation 'androidx.constraintlayout:constraintlayout:1.1.3'

Additionally, add the dependencies for CameraX (at the time of this writing, it is still in an alpha stage)

def camerax_version = "1.0.0-alpha01"
implementation "androidx.camera:camera-core:${camerax_version}"
implementation "androidx.camera:camera-camera2:${camerax_version}"

and Kotlin extensions

implementation 'androidx.core:core-ktx:1.0.2'

In order to use AndroidX components, you will also need to go into your gradle.properties file and ensure that AndroidX and jetifier are enabled.

android.enableJetifier=true
android.useAndroidX=true

Next, go into your AndroidManifest.xml file and add the following meta-data tag within your application node.

<meta-data
android:name="com.google.firebase.ml.vision.DEPENDENCIES"
android:value="label" />

Additionally, add the following permissions-related code within the manifest tag.

<uses-permission android:name="android.permission.CAMERA" />

<uses-feature android:name="android.hardware.camera" />
<uses-feature android:name="android.hardware.camera.autofocus" />

Finally, go into the activity_main.xml file and replace the layout code with the following

<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout
xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
xmlns:app="http://schemas.android.com/apk/res-auto"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".MainActivity">

<TextureView
android:id="@+id/view_finder"
android:layout_width="640px"
android:layout_height="640px"
app:layout_constraintTop_toTopOf="parent"
app:layout_constraintBottom_toBottomOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintEnd_toEndOf="parent" />

<TextView
android:id="@+id/label"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_margin="24dp"
app:layout_constraintBottom_toBottomOf="parent"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent" />

</androidx.constraintlayout.widget.ConstraintLayout>

We will use the TextureView for our image preview, and the TextView for displaying the label provided by our machine learning component.

Permissions

Now that we’re done with the initial setup, it’s time to open MainActivity.kt for the bulk of our work. You’ll need to start by supporting run time permissions, so add the following two lines to the top of the class

private val REQUEST_CODE_PERMISSIONS = 42

private val REQUIRED_PERMISSIONS = arrayOf(Manifest.permission.CAMERA)

Next, add the following two methods within MainActivity.

override fun onRequestPermissionsResult(
requestCode: Int, permissions: Array<String>, grantResults: IntArray) {
if (requestCode == REQUEST_CODE_PERMISSIONS) {
if (allPermissionsGranted()) {
//start Camera
} else {
Toast.makeText(this,
"Permissions not granted by the user.",
Toast.LENGTH_SHORT).show()
finish()
}
}
}

private fun allPermissionsGranted(): Boolean {
for (permission in REQUIRED_PERMISSIONS) {
if (ContextCompat.checkSelfPermission(
this, permission) != PackageManager.PERMISSION_GRANTED) {
return false
}
}
return true
}

Finally, update onCreate() to check if the camera permission has been granted, otherwise start the request flow.

override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)

if (allPermissionsGranted()) {
//Start camera
} else {
ActivityCompat.requestPermissions(
this, REQUIRED_PERMISSIONS, REQUEST_CODE_PERMISSIONS)
}
}

At this point it’s worth running your app to make sure everything compiles. If you had to replace your appcompat dependency in your build.gradle file, you will probably need to replace that imported dependency in MainActivity.kt for AppCompatActivity and ContextCompat, as well. If everything goes according to plan, you should be able to open the app and be prompted to grant camera permissions.

Displaying an Image Preview

Now we can get into the more interesting parts of this tutorial. Let’s start by taking the camera stream and displaying it on the screen. We can begin by creating two values at the top of MainActivity to reference the TextureView in our layout file and keep track of the device’s rotation.

private lateinit var viewFinder: TextureView
private var rotation = 0

Next, in onCreate(), initialize viewFinder after calling setContentView().

viewFinder = findViewById(R.id.view_finder)

You’ll also notice that we had commented out two lines in our class that currently say //Start camera. Replace both of these with

viewFinder.post { startCamera() }

We’ll define the startCamera() method soon. For now, return to the end of onCreate() and add the following lines to recompute the layout whenever the TextureView changes.

viewFinder.addOnLayoutChangeListener { _, _, _, _, _, _, _, _, _ ->
updateTransform()
}

where updateTransform() is defined as

private fun updateTransform() {
val matrix = Matrix()

//Find the center
val centerX = viewFinder.width / 2f
val centerY = viewFinder.height / 2f

//Get correct rotation
rotation = when(viewFinder.display.rotation) {
Surface.ROTATION_0 -> 0
Surface.ROTATION_90 -> 90
Surface.ROTATION_180 -> 180
Surface.ROTATION_270 -> 270
else -> return
}

matrix.postRotate(-rotation.toFloat(), centerX, centerY)

viewFinder.setTransform(matrix)
}

This method will attempt to find the center of the TextureView, and rotate the content around it based on the orientation of the view.

To wrap up preview, we’ll want to define the startCamera() method. In this method we will set up the PreviewConfig object, which is where we can define various properties for our display, turn that into a Preview object, and then associate that with our TextureView. This method will also bind our CameraX implementation to the Android lifecycle for proper initialization and teardown. We’ll revisit the lifecycle line in the next section of our tutorial.

private fun startCamera() {

val previewConfig = PreviewConfig.Builder().apply {
setTargetAspectRatio(Rational(1, 1))
}.build()

val preview = Preview(previewConfig)

preview.setOnPreviewOutputUpdateListener {
viewFinder.surfaceTexture = it.surfaceTexture
updateTransform()
}

CameraX.bindToLifecycle( this, preview)
}

If you run the app now, you should be able to see a square preview of whatever your camera is able to view, such as the Star Wars snowtrooper armor I have next to my desk.

Firebase Labeling

As we view objects with the camera, we’ll want to display what Firebase has detected. To do this, declare a TextView at the top of MainActivity.

private lateinit var label: TextView

You can initialize it in onCreate() immediately below the initialization of TextureView.

label = findViewById(R.id.label)

At the bottom of MainActivity, create a new inner class named LabelAnalyzer that extends ImageAnalysis.Analyzer and populate it with the default structure. This inner class will accept the TextView that you just declared as a constructor parameter.

private class LabelAnalyzer(val textView: TextView) : ImageAnalysis.Analyzer {
override fun analyze(image: ImageProxy, rotationDegrees: Int) {

}
}

Back in startCamera(), between where you set up the preview and the lifecycle binding, you will want to create an ImageAnalysisConfig object that will initialize a new background thread for analysis, and set the image reader mode to only return the latest acquired image.

val analyzerConfig = ImageAnalysisConfig.Builder().apply {
val analyzerThread = HandlerThread(
"LabelAnalysis").apply { start() }
setCallbackHandler(Handler(analyzerThread.looper))

setImageReaderMode(
ImageAnalysis.ImageReaderMode.ACQUIRE_LATEST_IMAGE)
}
.build()

Next, create an ImageAnalysis object, which will wrap our new LabelAnalyzer class.

val analyzerUseCase = ImageAnalysis(analyzerConfig).apply {
analyzer = LabelAnalyzer(label)
}

Finally, at the end of startCamera(), add the analyzerUseCase object to your lifecycle binding method call.

CameraX.bindToLifecycle(this, preview, analyzerUseCase)

Now we can start fleshing out our analyzer, which is where things get a bit more interesting. At the top of the inner class, add the following value to keep track of elapsed time, as we don’t want to run Firebase’s analysis on every frame.

private var lastAnalyzedTimestamp = 0L

Next, go into the analyze() method and get the current system time. This will be compared to the lastAnalyzedTimestamp value to see if we should analyze the frame.

val currentTimestamp = System.currentTimeMillis()
if (currentTimestamp - lastAnalyzedTimestamp >=
TimeUnit.SECONDS.toMillis(1)) {
lastAnalyzedTimestamp = currentTimestamp
}

The ImageProxy object sent into the analyze() method contains information about our latest image in YUV format. This means the image is broken into three planes: the first is a measure of brightness, and the second and third are measures of color in the red and blue space. We can retrieve these three planes like so

val y = image.planes[0]
val u = image.planes[1]
val v = image.planes[2]

Then we can then get the number of pixels in each plane

val Yb = y.buffer.remaining()
val Ub = u.buffer.remaining()
val Vb = v.buffer.remaining()

and convert them into a single YUV formatted ByteArray

val data = ByteArray(Yb + Ub + Vb)

y.buffer.get(data, 0, Yb)
u.buffer.get(data, Yb, Ub)
v.buffer.get(data, Yb + Ub, Vb)

Once we have the ByteArray, we can create a FirebaseVisionImageMetadata object with details about how we should configure our image labeling, and then request that Firebase generate labels for our image. Once the labels are generated, we can display them in the TextView that we created earlier.

val metadata = FirebaseVisionImageMetadata.Builder()
.setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_YV12)
.setHeight(image.height)
.setWidth(image.width)
.setRotation(getRotation(rotationDegrees))
.build()

val labelImage = FirebaseVisionImage.fromByteArray(data, metadata)

val labeler = FirebaseVision.getInstance().getOnDeviceImageLabeler()
labeler.processImage(labelImage)
.addOnSuccessListener { labels ->
textView.run {
if( labels.size >= 1 ) {
text = labels[0].text + " " + labels[0].confidence
}
}
}
}

At this point you may notice that metadata Builder calls a method named getRotation(). That method is defined within our inner class as

private fun getRotation(rotationCompensation: Int) : Int{
val result: Int
when (rotationCompensation) {
0 -> result = FirebaseVisionImageMetadata.ROTATION_0
90 -> result = FirebaseVisionImageMetadata.ROTATION_90
180 -> result = FirebaseVisionImageMetadata.ROTATION_180
270 -> result = FirebaseVisionImageMetadata.ROTATION_270
else -> {
result = FirebaseVisionImageMetadata.ROTATION_0
}
}
return result
}

Once you’re done filling out the analyzer class, it’s time to run your app. As you point the camera at different objects, you’ll see the TextView update with different labels as Firebase attempts to determine what it’s looking at. Using the same general logic as above, you should be able to use most, if not all, of ML Kit’s image based machine learning offerings to enhance your apps and provide value to your users with relatively little code.

--

--