VR on iPhone with AVPlayer and Metal

CINEMUR
CINEMUR Engineering
3 min readMar 14, 2017

Watching videos is one of the best use cases of VR for entertainment. CINEVR, an app that we have been developing at Cinemur since the end of 2015, aims at creating a realistic experience through a virtual movie theater.
Porting our VR experience to iOS has proven technically challenging, especially with the two key features to achieve immersion: video and audio.

The technical stack

Unity provides the necessary features for 3D and the core app is shared for all platforms, which saved us lots of time. GoogleVR plugin for Unity, available for both iOS and Android, manages the 3D-to-VR conversion. It uses accelerometer and gyroscope to position and orientate the virtual 3D camera, performs rendering for left and right eye, and applies lens distortion.

Similarly to Android where we use Exoplayer for audio/video playback, we have created a custom Unity plugin to play videos through a native player. On iOS, AVFoundation framework has a great flexibility for processing audio/video, making streaming formats possible, but also protected content using DRM.

Video playback

While OpenGL is possible, focus has been done on Metal. Apple takes advantage of the GPU hardware, with a performance improvement of 20% compared to OpenGL, and it works on all devices supporting iOS 9 or later.

AVPlayer / Unity /Metal interaction

AVFoundation has convenient features to intercept video output, with AVPlayerItemDataOutput class. It just needs to be attached to the AVPlayerItem when available. We enable suppressesPlayerRendering property, as video output will be entirely redirected to a texture:

Pixel buffer and Metal texture attributes require a particular attention for pixel format. We had to make sure format is consistent across all the components in the chain. In our case BGRA is used.

Once video size is known, a Metal texture is instantiated with the help of MTLDevice instance provided by Unity. Texture pointer is passed back to Unity to update the placeholder.

To refresh video texture, we take advantage of Unity GL.IssuePluginEvent method to perform a call to native methods. The method is called during a frame update and inside the Unity render thread.

Room with ambient light feature

In our scene, an “ambient light” feature makes the theater illuminated by the screen image. This feature is inspired by a post from the great John Carmack by blending a video texture mip level (a 1x1 texture) and room textures.

Mipmaps are not generated automatically by Unity for our native texture. The internal to external texture copy and mipmap generation can instead be achieved with a MTLBlitCommandEncoder:

Sound spatialization

As VR goal is to trick user’s senses, the importance of audio accounts for 50% of a successful experience. Our work in progress is to recreate the sound experience of a real theater.

To achieve these features, AVFoundation has a great AVAudioEngine component, built on top of AudioToolbox framework. It allows you to setup effects and process audio samples, by creating a graph composed of AVAudioNode objects.

The current audio setup with effects post-processing.

Once the audio graph is initialized and nodes are connected, audio output from the AVPlayer is redirected to the custom effects chain through a MTAudioProcessingTap and buffered into the AVAudioPlayerNode.

Our current (naive) implementation simulates the sound coming from the center of the screen. This is achieved by simply adjusting left/right headphones volume according to user head orientation in the scene. In addition, we apply a slight reverb and boost bass frequency on EQ in the audio mix, to give some more realistic room sound.

CINEVR is available both on App Store and Play Store. Give it a try and tell us what you think!

--

--

CINEMUR
CINEMUR Engineering

Design, tech and data team for video. Powering #CINEMUR, #CINEVR, VOD platforms and VR apps.