Vulkan on Android 6 — VR Stereo Rendering Part 3: Implementation

Jin-Long Wu

4 min readJan 31, 2019

In this post, we are going to see what we are gonna do in the whole picture and prepare kinds of stuff for rendering.

WorkFlow

Before we dive into the code, Let’s take a look at the steps we are about to go through.

Draw the scene to off-screen images.

Apply barrel distortion to the off-screen images.

Draw the barrel distorted images(both eyes) to a swap chain image like normal rendering.

Retrieve orientation from the device sensor and apply it to the virtual camera.

Barrel Distortion

Barrel distortion is one type of distortion caused by the physical lens. We view scenes with that small display screens. However, we want more immersive experiences, and we could just enlarge the size of HMD, but we don’t want to put that cumbersome device on our head. So we use a lens, a Fresnel lens before a display screen to bend the light to increase our field of view, but it introduces geometric distortion at the same time. A typical VR lens produces a distortion named pincushion distortion.

To recover the distortion we apply an inverse distortion named barrel distortion.

Rendering Resources

The number of resolve images equals the number of swapchain images. Each resolve image is rendered with the model, view, projection transform matrices with minor differences between frames.

Images, image views, and memories are all coupled for both eyes. Besides, a VkSampler is used to sample the off-screen images.

We create color, depth, and resolve images for both eyes and the process is pretty much the same with what we do in the previous post. The sampler for sampling the off-screen images uses mipmap mode of VK_SAMPLER_MIPMAP_MODE_NEAREST since we do not have any mipmap versions of the off-screen images. Therefore, we just choose the most performant one.

Uniform Buffers and Dynamic Uniform Buffers

UpdateImpl creates viewing frustums for both eyes based on given _fov, _zNear, _zFar, _focalLength, and _eyeSeparation, and the detailed process is explained in the previous post.

The buffer creation of dynamic uniform buffer is almost identical to what an ordinary uniform buffer does except for the memory alignment.

We query the value of minUniformBufferOffsetAlignment, which is the minimum alignment requirement for uniform buffer indicating that memory addresses must be multiple of it. Yet we still need to take the size of the struct used for dynamic buffer into account. So it should be the minimal multiples of minUniformBufferOffsetAlignment larger than minUniformBufferOffsetAlignment and the size of the struct. That is where line 94 came from.

You might have seen expressions like:

alignment = (size of struct + minAlignment - 1) & ~(minAlignment - 1)

which is almost identical to my general form but a little bit more performant for bit-wise operation, and it safely presumes that memory alignment requirement will never to be odd.

We prepare view and projection matrices for both eyes so _dynamicBufferSize will be _dynamicBufferAlignment * 2.

Descriptor Sets

Two descriptor sets are required to create barrel-distorted images. The first one is used to render the scene as usual except that the rendering target is off-screen images for both eyes and the second one takes the images as the input image for sampling.

We use different sets of resources on the rendering of off-screen and multi-view stages so two types of descriptor set layouts is needed. And each descriptor set of _lDescriptorSets and _rDescriptorSets takes each image view of _lMsaaResolvedViews and _rMsaaResolvedViews as the sampling target.

In order to apply different view and projection transformations to each eye, we separate them from the normal model, view, projection triplet structure on creating _msaaDescriptorSetLayout. Now they are in the type of dynamic uniform buffer rather than the normal uniform buffer and we will see how they are used later.

_multiviewDescriptorSetLayout needs only one combined image sampler descriptor type, that is, the off-screen images and unlike _msaaDescriptorSetLayout, we do not care about the model, view, and projection transformations considering that we’ll specify the clip space coordinates directly.

After allocating descriptor sets for _multiviewDescriptorSetLayout, each pImageInfo->imageView for the descriptor set should be the element of _lMsaaResolvedViews or _rMsaaResolvedViews depending on eye parameter we specify because the off-screen images differ each frame.

Reference

https://www.khronos.org/registry/vulkan/specs/1.0-wsi_extensions/html/vkspec.html