Getting the Most from the New Multi-Camera API
This blog post complements our Android Developer Summit 2018 talk, done in collaboration with Vinit Modi, the Android Camera PM, and Emilie Roberts, from the Partner Developer Relations team. Check out our previous blog posts in the series including camera enumeration, camera capture sessions and requests and using multiple camera streams simultaneously.
Multi-camera was introduced with Android Pie, and since launch a few months ago we are now seeing devices coming to market that support the API like the Google Pixel 3 and Huawei Mate 20 series. Many multi-camera use-cases are tightly coupled with a specific hardware configuration; in other words, not all use-cases will be compatible with every device — which makes multi-camera features a great candidate for dynamic delivery of modules. Some typical use-cases include:
- Zoom: switching between cameras depending on crop region or desired focal length
- Depth: using multiple cameras to build a depth map
- Bokeh: using inferred depth information to simulate a DSLR-like narrow focus range
Logical and physical cameras
To understand the multi-camera API, we must first understand the difference between logical and physical cameras; the concept is best illustrated with an example. For instance, we can think of a device with three back-facing cameras and no front-facing cameras as a reference. In this example, each of the three back cameras is considered a physical camera. A logical camera is then a grouping of two or more of those physical cameras. The output of the logical camera can be a stream that comes from one of the underlying physical cameras, or a fused stream coming from multiple underlying physical cameras simultaneously; either way that is handled by the camera HAL.
Many phone manufacturers also develop their first-party camera applications (which usually come pre-installed on their devices). To utilize all of the hardware’s capabilities, they sometimes made use of private or hidden APIs or received special treatment from the driver implementation that other applications did not have privileged access to. Some devices even implemented the concept of logical cameras by providing a fused stream of frames from the different physical cameras but, again, this was only available to certain privileged applications. Often, only one of the physical cameras would be exposed to the framework. The situation for third party developers prior to Android Pie is illustrated in the following diagram:
Beginning in Android Pie, a few things have changed. For starters, private APIs are no longer OK to use in Android apps. Secondly, with the inclusion of multi-camera support in the framework, Android has been strongly recommending that phone manufacturers expose a logical camera for all physical cameras facing the same direction. As a result, this is what third party developers should expect to see on devices running Android Pie and above:
It is worth noting that what the logical camera provides is entirely dependent on the OEM implementation of the Camera HAL. For example, a device like Pixel 3 implements its logical camera in such a way that it will choose one of its physical cameras based on the requested focal length and crop region.
The multi-camera API
The new API consists in the addition of the following new constants, classes and methods:
Thanks to changes to the Android CDD, the multi-camera API also comes with certain expectations from developers. Devices with dual cameras existed prior to Android Pie, but opening more than one camera simultaneously involved trial and error; multi-camera on Android now gives us a set of rules that tell us when we can open a pair of physical cameras as long as they are part of the same logical camera.
As stated above, we can expect that, in most cases, new devices launching with Android Pie will expose all physical cameras (the exception being more exotic sensor types such as infrared) along with an easier to use logical camera. Also, and very crucially, we can expect that for every combination of streams that are guaranteed to work, one stream belonging to a logical camera can be replaced by two streams from the underlying physical cameras. Let’s cover that in more detail with an example.
Multiple streams simultaneously
In our last blog post, we covered extensively the rules for using multiple streams simultaneously in a single camera. The exact same rules apply for multiple cameras with a notable addition explained in the documentation:
For each guaranteed stream combination, the logical camera supports replacing one logical YUV_420_888 or raw stream with two physical streams of the same size and format, each from a separate physical camera, given that the size and format are supported by both physical cameras.
In other words, each stream of type YUV or RAW can be replaced with two streams of identical type and size. So, for example, we could start with a camera stream of the following guaranteed configuration for single-camera devices:
- Stream 1: YUV type, MAXIMUM size from logical camera `id = 0`
Then, a device with multi-camera support will allow us to create a session replacing that logical YUV stream with two physical streams:
- Stream 1: YUV type, MAXIMUM size from physical camera `id = 1`
- Stream 2: YUV type, MAXIMUM size from physical camera `id = 2`
The trick is that we can replace a YUV or RAW stream with two equivalent streams if and only if those two cameras are part of a logical camera grouping — i.e. listed under CameraCharacteristics.getPhysicalCameraIds().
Another thing to consider is that the guarantees provided by the framework are just the bare minimum required to get frames from more than one physical camera simultaneously. We can expect for additional streams to be supported in most devices, sometimes even letting us open multiple physical camera devices independently. Unfortunately, since it’s not a hard guarantee from the framework, doing that will require us to perform per-device testing and tuning via trial and error.
Creating a session with multiple physical cameras
When we interact with physical cameras in a multi-camera enabled device, we should open a single CameraDevice (the logical camera) and interact with it within a single session, which must be created using the API CameraDevice.createCaptureSession(SessionConfiguration config) available since SDK level 28. Then, the session configuration will have a number of output configurations, each of which will have a set of output targets and, optionally, a desired physical camera ID.
Later, when we dispatch a capture request, said request will have an output target associated with it. The framework will determine which physical (or logical) camera the request will be sent to based on what output target is attached to the request. If the output target corresponds to one of the output targets that was sent as an output configuration along with a physical camera ID, then that physical camera will receive and process the request.
Using a pair of physical cameras
One of the most important developer-facing additions to the camera APIs for multi-camera is the ability to identify logical cameras and finding the physical cameras behind them. Now that we understand that we can open physical cameras simultaneously (again, by opening the logical camera and as part of the same session) and the rules for combining streams are clear, we can define a function to help us identify potential pairs of physical cameras that can be used to replace one of the logical camera streams:
State handling of the physical cameras is controlled by the logical camera. So, to open our “dual camera” we just need to open the logical camera corresponding to the physical cameras that we are interested in:
Up until this point, besides selecting which camera to open, nothing is different compared to what we have been doing to open any other camera in the past. Now it’s time to create a capture session using the new session configuration API so we can tell the framework to associate certain targets with specific physical camera IDs:
At this point, we can refer back to the documentation or our previous blog post to understand which combinations of streams are supported. We just need to remember that those are for multiple streams on a single logical camera, and that the compatibility extends to using the same configuration and replacing one of those streams with two streams from two physical cameras that are part of the same logical camera.
With the camera session ready, all that is left to do is dispatching our desired capture requests. Each target of the capture request will receive its data from its associated physical camera, if any, or fall back to the logical camera.
Zoom example use-case
To tie all of that back to one of the initially discussed use-cases, let’s see how we could implement a feature in our camera app so that users can switch between the different physical cameras to experience a different field-of-view — effectively capturing a different “zoom level”.
First, we must select the pair of physical cameras that we want to allow users to switch between. For maximum effect, we can search for the pair of cameras that provide the minimum and maximum focal length available, respectively. That way, we select one camera device able to focus on the shortest possible distance and another that can focus at the furthest possible point:
A sensible architecture for this would be to have two SurfaceViews, one for each stream, that get swapped upon user interaction so that only one is visible at any given time. In the following code snippet, we demonstrate how to open the logical camera, configure the camera outputs, create a camera session and start two preview streams; leveraging the functions defined previously:
Now all we need to do is provide a UI for the user to switch between the two surfaces, like a button or double-tapping the
SurfaceView; if we wanted to get fancy we could try performing some form of scene analysis and switch between the two streams automatically.
All lenses produce a certain amount of distortion. In Android, we can query the distortion created by lenses using CameraCharacteristics.LENS_DISTORTION (which replaces the now-deprecated CameraCharacteristics.LENS_RADIAL_DISTORTION). For logical cameras, it is reasonable to expect that the distortion will be minimal and our application can use the frames more-or-less as they come from the camera. However, for physical cameras, we should expect potentially very different lens configurations — especially on wide lenses.
Some devices may implement automatic distortion correction via CaptureRequest.DISTORTION_CORRECTION_MODE. It is good to know that distortion correction defaults to being on for most devices.The documentation has some more detailed information:
FAST/HIGH_QUALITY both mean camera device determined distortion correction will be applied. HIGH_QUALITY mode indicates that the camera device will use the highest-quality correction algorithms, even if it slows down capture rate. FAST means the camera device will not slow down capture rate when applying correction. FAST may be the same as OFF if any correction at all would slow down capture rate […] The correction only applies to processed outputs such as YUV, JPEG, or DEPTH16 […] This control will be on by default on devices that support this control.
If we wanted to take a still shot from a physical using the highest possible quality, then we should try to set correction mode to HIGH_QUALITY if it’s available. Here’s how we should be setting up our capture request:
Keep in mind that setting a capture request in this mode will have a potential impact on the frame rate that can be produced by the camera, which is why we are only setting the distortion correction in still image captures.
To be continued
Phew! We covered a bunch of things related to the new multi-camera APIs:
- Potential use-cases
- Logical vs physical cameras
- Overview of the multi-camera API
- Extended rules for opening multiple camera streams
- How to setup camera streams for a pair of physical cameras
- Example “zoom” use-case swapping cameras
- Correcting lens distortion
Note that we have not covered frame synchronization and computing depth maps. That is a topic worthy of its own blog post 🙂