Android’s CameraX and ML Kit

Paul Trebilcox-Ruiz
May 29 · 8 min read

Announced at Google I/O this summer (2019), CameraX aims to alleviate some of the pain of using a camera in an Android app by wrapping common use cases with a simple to use API. In this tutorial we’ll go over how you would display a preview from the camera, and then analyze the camera input in real time to display the most likely object seen through the use of the Firebase ML Kit labeling API. It’s worth noting that while we’re only using the labeling API here, ML Kit does an amazing job of using very similar code across its various functionalities, so you should be able to analyze bar codes, detect faces, or recognize famous landmarks with a few changed lines of code.


For this tutorial, start by creating a new Android application with an empty Activity.

Once your initial project has built, go to the Firebase Console and create a new project.

From the Project Overview screen, you’ll notice a section at the top that says Get started by adding Firebase to your app. Click on the Android face icon to add your Android app to the Firebase console.

On the next page you will want to add your package name into the form. You can skip the nickname and SHA-1 key, as we won’t be using any features that require these.

The next few steps on this page will guide you through downloading your google-services.json file and placing it into your app, then initializing the Firebase SDK in your app.

After you’ve finished the Firebase setup process and verified that your app is able to connect to Firebase, open your module level build.gradle file. Under the dependencies node, add the following two lines to include local Firebase labeling in your app

Depending on the template used for creating your app, you may need to remove the old constraint-layout and appcompat dependencies, which will look like

and replace them with

Additionally, add the dependencies for CameraX (at the time of this writing, it is still in an alpha stage)

and Kotlin extensions

In order to use AndroidX components, you will also need to go into your file and ensure that AndroidX and jetifier are enabled.

Next, go into your AndroidManifest.xml file and add the following meta-data tag within your application node.

Additionally, add the following permissions-related code within the manifest tag.

Finally, go into the activity_main.xml file and replace the layout code with the following

We will use the TextureView for our image preview, and the TextView for displaying the label provided by our machine learning component.


Now that we’re done with the initial setup, it’s time to open MainActivity.kt for the bulk of our work. You’ll need to start by supporting run time permissions, so add the following two lines to the top of the class

Next, add the following two methods within MainActivity.

Finally, update onCreate() to check if the camera permission has been granted, otherwise start the request flow.

At this point it’s worth running your app to make sure everything compiles. If you had to replace your appcompat dependency in your build.gradle file, you will probably need to replace that imported dependency in MainActivity.kt for AppCompatActivity and ContextCompat, as well. If everything goes according to plan, you should be able to open the app and be prompted to grant camera permissions.

Displaying an Image Preview

Now we can get into the more interesting parts of this tutorial. Let’s start by taking the camera stream and displaying it on the screen. We can begin by creating two values at the top of MainActivity to reference the TextureView in our layout file and keep track of the device’s rotation.

Next, in onCreate(), initialize viewFinder after calling setContentView().

You’ll also notice that we had commented out two lines in our class that currently say //Start camera. Replace both of these with

We’ll define the startCamera() method soon. For now, return to the end of onCreate() and add the following lines to recompute the layout whenever the TextureView changes.

where updateTransform() is defined as

This method will attempt to find the center of the TextureView, and rotate the content around it based on the orientation of the view.

To wrap up preview, we’ll want to define the startCamera() method. In this method we will set up the PreviewConfig object, which is where we can define various properties for our display, turn that into a Preview object, and then associate that with our TextureView. This method will also bind our CameraX implementation to the Android lifecycle for proper initialization and teardown. We’ll revisit the lifecycle line in the next section of our tutorial.

If you run the app now, you should be able to see a square preview of whatever your camera is able to view, such as the Star Wars snowtrooper armor I have next to my desk.

Firebase Labeling

As we view objects with the camera, we’ll want to display what Firebase has detected. To do this, declare a TextView at the top of MainActivity.

You can initialize it in onCreate() immediately below the initialization of TextureView.

At the bottom of MainActivity, create a new inner class named LabelAnalyzer that extends ImageAnalysis.Analyzer and populate it with the default structure. This inner class will accept the TextView that you just declared as a constructor parameter.

Back in startCamera(), between where you set up the preview and the lifecycle binding, you will want to create an ImageAnalysisConfig object that will initialize a new background thread for analysis, and set the image reader mode to only return the latest acquired image.

Next, create an ImageAnalysis object, which will wrap our new LabelAnalyzer class.

Finally, at the end of startCamera(), add the analyzerUseCase object to your lifecycle binding method call.

Now we can start fleshing out our analyzer, which is where things get a bit more interesting. At the top of the inner class, add the following value to keep track of elapsed time, as we don’t want to run Firebase’s analysis on every frame.

Next, go into the analyze() method and get the current system time. This will be compared to the lastAnalyzedTimestamp value to see if we should analyze the frame.

The ImageProxy object sent into the analyze() method contains information about our latest image in YUV format. This means the image is broken into three planes: the first is a measure of brightness, and the second and third are measures of color in the red and blue space. We can retrieve these three planes like so

Then we can then get the number of pixels in each plane

and convert them into a single YUV formatted ByteArray

Once we have the ByteArray, we can create a FirebaseVisionImageMetadata object with details about how we should configure our image labeling, and then request that Firebase generate labels for our image. Once the labels are generated, we can display them in the TextView that we created earlier.

At this point you may notice that metadata Builder calls a method named getRotation(). That method is defined within our inner class as

Once you’re done filling out the analyzer class, it’s time to run your app. As you point the camera at different objects, you’ll see the TextView update with different labels as Firebase attempts to determine what it’s looking at. Using the same general logic as above, you should be able to use most, if not all, of ML Kit’s image based machine learning offerings to enhance your apps and provide value to your users with relatively little code.