iPhone XR: A Deep Dive Into Depth

5 min readNov 5, 2018

With the introduction of iPhone XR, every phone in Apple’s lineup now supports depth capture. But the XR is unique: it’s the first iPhone to do it with a single lens. As we were starting to test and optimize Halide for it, we found both advantages and disadvantages.

In this post we’ll take a look at three different ways iPhones generate depth data, what makes the iPhone XR so special, and show off Halide’s new 1.11 update, which enables you to do things with the iPhone XR that the regular camera app won’t.

Depth Capture Method 1: Dual Camera Disparity

Humans perceive depth with the help of two eyes. Our eyes may only a few inches apart, but our brains detect subtle differences between image. The greater difference, or disparity, the closer an object.

Disparity illustrated. This image is from Professor David Heeger’s page on Perception.

The iPhone 7 Plus introduced a dual-camera system, which enables a similar way to construct depth. By taking two photos at the same time, each from a slightly different position, we can construct a disparity map.

There’s a lot of guesswork involved when matches images. Add video noise and things get even rougher. A great deal of effort goes into filtering the data, additional post processing that guesses how to smooth edges and fill holes.

This requires a lot of computation. It wasn’t until the iPhone X that iOS could achieve 30 frames a second. This all takes a lot memory. For a while, our top crash was running out of memory because the system was using most of it on depth.

Dual Camera Drawbacks

One limitation and quirk of this method is that you can only generate depth for the parts of the two images that overlap. In other words, if you have a wide angle and telephoto lens, you can only generate depth data for the telephoto lens.

Another limitation is that you can’t use manual controls. The system needs to perfectly synchronize frame delivery, and each camera’s exposure. Trying to manage these settings by hand would be like trying to drive two cars at once.

Finally, while the 2-D color data may be twelve megapixels, the disparity map is about half a megapixel. If you tried to use it for portrait mode, you’d end up with blurry edges that ruin the effect. We can sharpen edges by observing contrast in the 2D image, but that’s not good enough for fine details like human hair.

Depth Capture Method 2: TrueDepth Sensor

With the iPhone X, Apple introduced the TrueDepth camera. Instead of measuring disparity, it uses an infrared light to project over 30,000 dots.

However, depth data isn’t entirely based off the IR dots. Sit in a pitch-black room, and take a peek at the TrueDepth data:

Clearly the system uses color data as part of its calculations.

TrueDepth Drawbacks

One drawback of TrueDepth is sensitivity to infrared interference. This means bright sunlight affects quality.

Why not include a TrueDepth sensor on the back of the Xr? I think it’s three simple reasons: cost, range and complexity.

People will pay extra for Face ID, which requires the IR sensor. People will pay extra for a telephoto lens. People aren’t ready to pay a premium for better depth photos.

Add to that that infrared gets much worse at sensing depth over greater distances and the ever-growing camera bump people already resent and you can see why Apple would be rather hesitant to go with this solution for the rear-facing camera(s).

Depth Capture Method 3: Focus Pixels and PEM

In the iPhone XR keynote, Apple said:

What the team is able to do is combine hardware and software to create a depth segmentation map using the focus pixels and neural net software so that we can create portrait mode photos on the brand-new iPhone XR.

Apple’s marketing department invented the term “Focus Pixels.” The real term is Dual Pixel Auto Focus (DPAF) and they’re a common sight on cameras and phones today that first came to the iPhone with the iPhone 6.

DPAF was invented to find focus in a scene really fast, which is important when shooting video with moving subjects. However, the way it’s designed opens the door to disparity mapping.

Using them to capture depth is pretty new. The Pixel 2 was the first phone to capture depth on a single camera using DPAF. In fact, they published a detailed article describing exactly how.

In a DPAF system, each pixel on the sensor is made up of two sub-pixels, each with their own tiny lens. The hardware finds focus similar to a rangefinder camera; when the two sub-pixels are identical, it knows that pixel is in focus. Imagine the disparity diagram we showed earlier, but on an absolutely minuscule scale.

If you captured two separate images, one for each set of sub-pixels, you’d get two images about one millimeter apart. It turns out this is just enough disparity to get very rough depth information.

As I mentioned, this technology is also used by Google, and the Pixel team had to do a bunch of work to get it in a usable state:

One more detail: because the left-side and right-side views captured by the Pixel 2 camera are so close together, the depth information we get is inaccurate, especially in low light, due to the high noise in the images. To reduce this noise and improve depth accuracy we capture a burst of left-side and right-side images, then align and average them before applying our stereo algorithm.