Getting Started: Android CameraX

A quick guide (and sample code) to get you started on using the new Android Jetpack CameraX API as a Barcode Scanner with MLKit.

werner
8 min readMay 12, 2021
Photo by ShareGrid on Unsplash

As the name implies, Jetpack (a collection of modern APIs for Android development) truly gives developers a productivity boost — and the introduction of the Jetpack CameraX API is no different.

With CameraX, developing camera based Android views is a more streamlined process with some standout features as compared to its predecessor (Camera2):

  • Abstracts away the complexities of differences in OEM sensor hardware drivers.
  • Supports Android API 21 and up.
  • As with other Jetpack components, it is lifecycle aware, and takes care of bringing up and breaking down the required resources in response to app lifecycle events.
  • To me, the most exciting feature is the introduction of what can be called the “Use Case Pipeline”

A Use Case pipeline?

Essentially a UseCase in the context of the CameraX API is a class that accepts an image frame from the camera, does something with it, and notifies the API when it is done, at which time the next Use Case is called — rinse and repeat.

As an example — this is how you would initialize / setup a camera instance with CameraX:

Screenshot of ProcessCameraProvider.java in android.camerax.lifecycle

As you can see, the bindToLifecycle function is passed:

  • The Activity (LifecycleOwner) that will host the camera instance.
  • A helper (CameraSelector) that is used to nominate which hardware sensor (camera) to use.
  • A list (vararg) of UseCase types to run each frame through.

This last parameter allows you to chain together multiple Use Cases as required.

Moreover, as mentioned earlier, this is bound to a lifecycle aware Camera instance, so for example, if you background the app, CameraX will take care of suspending the hardware and making associated resources available for garbage collection.

Base Use Cases

As of now, the CameraX API has three distinct base Use Cases, namely:

  • Preview: accepts a surface for displaying a preview — Preview
  • Image analysis: provides CPU-accessible buffers for analysis, such as for machine learning inference :) — ImageAnalysis
  • Image capture: captures and saves a photo — ImageCapture

It is safe to assume most of the time that thePreview Use Case will be placed in the pipeline. This will take the received image frame and render it to a Surface ( androidx.camera.view.PreviewView) for the user to see.

Similarly, ImageCapture provides the functionality to save the received frame as a photo.

The most exciting is obviouslyImageAnalyses — this gives you the raw framebuffer to run inference on with an ML model of your choosing.

For the purpose of this article, I will use the BarcodeScanning API (available as part of Google’s MLKit framework) to run inference on the image frame to recognize and decode different barcode standards found in the image

Suffice to say, using this approach to scan barcodes blows everything else — for example the ZXing library — out of the water in terms of accuracy and speed.

In my tests I got recognition speeds of<200ms. As a comparison, using deterministic approaches from libraries like ZXing et al. results in speeds an order of magnitude larger (>2 seconds) under ideal conditions — not to mention the major difference in CPU utilization.

Digging In

I now introduce a demo Android project that implements CameraX and MLKit to demonstrate how you can use CameraX to run inference on an image frame (as mentioned — a Barcode Scanner)

Here is a video of the result looking at some packaging that contains both an EAN Barcode and QR Code (Notice the decoded barcode string at the bottom of the screen)

Getting Started

Although you can use the example code on GitHub, I will point out some (albeit obvious) things to remember:

  • Add the CameraX (and optionally the MLKit) dependencies in your app module build.gradle file.
  • Remember to declare the AndroidManifest.xml permissions and features for using the device camera hardware.
  • The project uses dataBinding so ensure your version of Android Studio is compatible to auto-generate the binding classes.
  • In the example project, the actual integration is placed in CameraHelper.kt to keep it portable.

A general paradigm to be aware of is that image processing and inference on a mobile device can be a relatively expensive operation and pipelining multiple Use Cases (especially inefficient ones) together can have drastic impact on device battery and resources.

At the end of this article, I discuss how to let use cases to skip frames above some load threshold — resulting in reduced throughput — but better resource utilization.

Main Activity

The first order of business is creating the view where the preview (the camera feed the user can see) is rendered:

Here I use a view specifically designed for rendering camera previews. Note that although the view fills the parent, the actual dimensions of the rendered frames is set with an aspect ratio calculation depending on device screen size and orientation.

Next up, the activity that binds this view:

Nothing fancy here:

  • The CameraHelper class is initialized with owner, context, the view to render the preview on, and a callback function to receive decoded barcode recognition results.
  • A permission handler override passes the permission result back to the helper.

The CameraHelper itself is where the action is. You can have a look at the class directly on Github:

Although not essential, for convenience we create a typealias Listener object to be passed for the analyzer to callback results from the barcode inference with:

Next we create an ExecutorService to run theImageAnalysis Analyzer in the background / on it’s own thread in addition to a few global properties:

The start function checks if permissions are granted, and if not, launches the Camera permission request, which will result in the Activity above getting the permission result, which is again passed to the onRequestPermissionsResult of this class:

As you can see this process will loop indefinitely until the user grants camera permissions — as without the camera the app cannot do what it is designed to do.

Next the startCamera function sets up the camera future provider within the context executor role:

Here we “waterfall” the selected camera (with back camera preferred) and then proceed to bind the use cases to the provider:

  • As per the bindToLifecycle method signature described at the start of this article, aCameraSelector instance is required to indicate which hardware sensor (camera) to use. If you wanted to add a button on the view to toggle the camera, you would update lensFacing with the required sensor and call bindCameraUseCases again — but for the purpose of scanning barcodes, the “selfie” camera is not generally used.
  • Next we nominate the use cases to run the image frames through
  • And finally, set the surface to show the preview on.

At this point — the app is running, the camera is capturing frames and the CameraX API is sending them to the nominated Use Cases.

Reading Barcodes (The ImageAnalysis Use Case)

This is the exciting part and demonstrates what CameraX and it’s concept of Use Cases does:

  • We extend the ImageAnalysis.Analyzer base Use Case…
  • … and override the analyze function which is passed an ImageProxy instance.

imageProxy contains helpers to get/set data on the image — in this example we simply get the frame as an Image instance — but the proxy also:

  • Exposes the raw pixel buffer
  • Enables setting and getting a cropped rectangle (sub image)
  • Getting image type, dimensions and rotation, among others.

Note: Rotation (or orientation) of the image frame is important for inference tasks as many CV models are trained on data in a certain orientation. You can also see it is passed to the BarcodeScanner (MLKit) API.

Finally, we run the frame through the BarcodeScanner API as provided by MLKit — which has a callback containing all the recognized barcodes in the frame and in turn call our listener(s) with the result (if any)

Lastly, and most importantly, we call imageProxy.close() — This is required to let theCameraProvider know we are done processing the frame. Not doing this will result in other Use Cases (including the Preview) freezing as it assumes the Use Case is still processing.

A Note on Performance

When we initialize the analyzer, we stipulate a few flags that define the frame buffer behaviour for the CameraProvider:

Backpressure

This is perhaps the most useful flag with regard to performance, and depends on whether you want to process each frame from the camera, or if you are happy only receiving the latest frame.

ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST will send your Use Case the latest available frame.

ImageAnalysis.STRATEGY_BLOCK_PRODUCER will “pause” the Camera until the provided frame is analyzed (and imageProxy.close() is called)

Queue Depth

Of course, if you need to process every frame, this will have a performance hit if your code is not optimized or takes too long, so to further help in this case an ImageQueueDepth can be specified. As per the official documentation:

Sets the number of images available to the camera pipeline for ImageAnalysis.STRATEGY_BLOCK_PRODUCER mode.

The image queue depth is the number of images available to the camera to fill with data. This includes the image currently being analyzed by ImageAnalysis.Analyzer.analyze(ImageProxy). Increasing the image queue depth may make camera operation smoother, depending on the backpressure strategy, at the cost of increased memory usage.

When the backpressure strategy is set to ImageAnalysis.STRATEGY_BLOCK_PRODUCER, increasing the image queue depth may make the camera pipeline run smoother on systems under high load. However, the time spent analyzing an image should still be kept under a single frame period for the current frame rate, on average, to avoid stalling the camera pipeline.

The value only applies to ImageAnalysis.STRATEGY_BLOCK_PRODUCER mode. For ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST the value is ignored.

If not set, and this option is used by the selected backpressure strategy, the default will be a queue depth of 6 images.

Advanced Note: Tracking FPS

As hinted in the official Google example app code for CameraX an ImageAnalysis Use Case can calculate the current FPS rate as it’s running.

Although the executor is usually running on it’s own background thread, it could — in some cases — be useful to simply call imageProxy.close() if your FPS drops below a certain threshold instead of processing the frame.

Using this technique together with the buffer flags set above, you are able to fine tune performance as needed.

Conclusion

This article gives you a basic example of using the new CameraX API. There are many more possibilities around the ImageAnalysis base Use Case and I would like to explore more possibilities in future posts.

Thanks for reading and I hope you got some useful insights!

--

--