Using multiple camera streams simultaneously

Published in

Android Developers

6 min readOct 11, 2018

This blog post is the latest one in the current series about camera on Android; we have previously covered camera enumeration and camera capture sessions and requests.

Use cases for multiple camera streams

A camera application might want to use more than one stream of frames simultaneously, in some cases different streams even require a different frame resolution or pixel format; some typical use cases include:

Video recording: one stream for preview, another being encoded and saved into a file
Barcode scanning: one stream for preview, another for barcode detection
Computational photography: one stream for preview, another for face / scene detection

As we discussed in our previous blog post, there is a non-trivial performance cost when we process frames, and the cost is multiplied when doing parallel stream / pipeline processing.

Resources like CPU, GPU and DSP might be able to take advantage of the framework’s reprocessing capabilities, but resources like memory will grow linearly.

Multiple targets per request

Multiple camera streams can be combined into a single CameraCaptureRequest by performing a somewhat bureaucratic procedure. This code snippet illustrates how to setup a camera session with one stream for camera preview and another stream for image processing:

If you configure the target surfaces correctly, this code will only produce streams that meet the minimum FPS determined by StreamComfigurationMap.GetOutputMinFrameDuration(int, Size) and StreamComfigurationMap.GetOutputStallDuration(int, Size). Actual performance will vary from device to device although, Android gives us some guarantees for supporting specific combinations depending on three variables: output type, output size and hardware level. Using an unsupported combination of parameters may work at a low frame rate; or it may not work at all, triggering one of the failure callbacks. The documentation describes in great detail what is guaranteed to work and it is strongly recommended to read it in full, but we will cover the basics here.

Output type

Output type refers to the format in which the frames are encoded. The possible values described in the documentation are PRIV, YUV, JPEG and RAW. The documentation best explains them:

PRIV refers to any target whose available sizes are found using StreamConfigurationMap.getOutputSizes(Class) with no direct application-visible format
YUV refers to a target Surface using the ImageFormat.YUV_420_888 format
JPEG refers to the ImageFormat.JPEG format
RAW refers to the ImageFormat.RAW_SENSOR format.

When choosing your application’s output type, if the goal is to maximize compatibility then the recommendation is to use ImageFormat.YUV_420_888 for frame analysis and ImageFormat.JPEG for still images. For preview and recording scenarios, you will likely be using a SurfaceView, TextureView, MediaRecorder, MediaCodec or RenderScript.Allocation. In those cases do not specify an image format and for compatibility purposes it will count as ImageFormat.PRIVATE (regardless of the actual format used under the hood). To query the formats supported by a device given its CameraCharacteristics, use the following code:

Output size

All available output sizes are listed when we call StreamConfigurationMap.getOutputSizes(), but as far as compatibility goes we only need to worry about two of them: PREVIEW and MAXIMUM. We can think of those sizes as upper bounds; if the documentation says something of size PREVIEW works, then anything with a size smaller than PREVIEW also works. Same applies to MAXIMUM. Here’s a relevant excerpt from the documentation:

For the maximum size column, PREVIEW refers to the best size match to the device’s screen resolution, or to 1080p (1920x1080), whichever is smaller. RECORD refers to the camera device’s maximum supported recording resolution, as determined by CamcorderProfile. And MAXIMUM refers to the camera device’s maximum output resolution for that format or target from StreamConfigurationMap.getOutputSizes(int).

Note that the available output sizes depend on the choice of format. Given the CameraCharacteristics and a format, we can query for the available output sizes like this:

In the camera preview and recording use cases, we should be using the target class to determine supported sizes since the format will be handled by the camera framework itself:

Getting MAXIMUM size is easy — just sort the output sizes by area and return the largest one:

Getting PREVIEW size requires a little more thinking. Recall that PREVIEW refers to the best size match to the device’s screen resolution, or to 1080p (1920x1080), whichever is smaller. Keep in mind that the aspect ratio may not match the screen’s aspect ratio exactly, so we may need to apply letter-boxing or cropping to the stream if we plan on displaying it in full screen mode. In order to get the right preview size, we need to compare the available output sizes with the display size while taking into account that the display may be rotated. In this code, we also define a helper class SmartSize that will make size comparisons a little easier:

Hardware level

To determine the available capabilities at runtime the most important piece of information a camera application needs is the supported hardware level. Once again, we can lean on the documentation to explain this to us:

The supported hardware level is a high-level description of the camera device’s capabilities, summarizing several capabilities into one field. Each level adds additional features to the previous one, and is always a strict superset of the previous level. The ordering is LEGACY < LIMITED < FULL < LEVEL_3.

With a CameraCharacteristics object, we can retrieve the hardware level with a single statement:

Putting all the pieces together

Once we understand output type, output size and hardware level we can determine which combinations of streams are valid. For instance, here’s a snapshot of the configurations supported by a CameraDevice with LEGACY hardware level. The snapshot is taken from the documentation for createCaptureSession method:

Since LEGACY is the lowest possible hardware level, we can infer from the previous table that every device that supports Camera2 (i.e. API level 21 and above) can output up to three simultaneous streams using the right configuration — that’s pretty cool! However, it may not be possible to achieve the maximum available throughput on many devices because your own code will likely incur overhead which invokes other constraints that limit performance, such as memory, CPU and even thermal.

Now that we have the knowledge necessary to set up two simultaneous streams with support guaranteed by the framework, we can dig a little deeper into the configuration of the target output buffers. For example, if we were targeting a device with LEGACY hardware level, we could setup two target output surfaces: one using ImageFormat.PRIVATE and another one using ImageFormat.YUV_420_888 . This should be a supported combination as per the table above as long as we use the PREVIEW size. Using the function defined above, getting the required preview sizes for a camera ID is now very simple:

We must wait until SurfaceView is ready using the provided callbacks, like this:

We can even force the SurfaceView to match the camera output size by calling SurfaceHolder.setFixedSize(), but it may be better in terms of UI to take an approach similar to FixedAspectSurfaceView from the HDR viewfinder sample on GitHub, which sets an absolute size taking into consideration both the aspect ratio and the available space, while automatically adjusting when activity changes are triggered.

Setting up the other surface from ImageReader with the desired format is even easier, since there are no callbacks to wait for:

When using a blocking target buffer like ImageReader, we need to discard the frames after we used them:

We should keep in mind that we are targeting the lowest common denominator — devices with LEGACY hardware level. We could add conditional branching and use RECORD size for one of the output target surfaces in devices with LIMITED hardware level or even bump that up to MAXIMUM size for devices with FULL hardware level.

Summary

In this article, we have covered:

Using a single camera device to output multiple streams simultaneously
The rules for combining different targets in a single capture request
Querying and selecting the appropriate output type, output size and hardware level
Setting up and using a Surface provided by SurfaceView and ImageReader

With this knowledge, now we can create a camera app that has the ability to display a preview stream while performing asynchronous analysis of incoming frames in a separate stream.