On Facebook’s video improvements for VR

Shi Yan
4 min readJan 22, 2016

Today I saw this Facebook post on Hackernews. I think it’s super genius but simple as hell. My friend actually did something similar to optimize pixel shader for VR. I want to summarize what I’ve learnt.

The computation complexity of an image processing algorithm, be it a piece of pixel shader or a video compression algorithm can be simply measured by the count of pixels or the area of the image. So when you want to improve the performance of your image processing algorithm, one idea is reducing the pixel count of the input image. But you need to consider two things.

  1. Which pixels to remove. Pixels are of different importance, those under focus are the most important ones. Those outside your focus of view are less important. You want to remove them.
  2. After removing unimportant pixels, the resulting image must be still a rectangle. This is because the current graphics architectures as well as storage methods handle rectangles more easily than any other irregular shape. Representing an irregular shape needs more meta data and more complex computation logic, that won’t benefit you very much on performance.

So the Facebook optimization as well as the shader trick my friend did are all based on these two ideas. I’d like to start with the Facebook idea for VR video streaming.

VR is getting more and more popular these days thanks to the availability of VR hardwares. As a result, it is of interest to stream 360 videos. But the current 360 video streaming is built with the current video technology that is mainly designed for 2D content. When the content is a 360 video on a sphere, a common method is remapping the sphere onto a 2D image the same way you would map a globe on a 2D map.

The problem of this “map” is that the pole areas are stretched, but those areas contain the same amount of information and should not receive special attention. One can easily see the problem by checking the true size of Greenland. The reason for stretching them is just for mapping a sphere to 2D, but it increases computation and data size.

a 360 video as stretched by the “map” way

So the first optimization Facebook did was using Cube map. Cube map stretches the image less than the map solution and it stretches the image uniformly. This immediately saves bandwidth.

But when people use VR, the area that is currently visible is just a small portion of the sphere. Does it make sense to treat each pixel with the same amount of attention? One may argue that we could simply discard the facets of the cube map that are currently unseen. But this naive approach causes trouble in the streaming case, because the user might rotate the viewing angle rapidly, it’s hard to deliver a previous discarded facet timely when the user turns into a new direction. So we better still stream the entire sphere, but treat those pixels that are unseen with less attention.

The way they do it is putting the original sphere inside a pyramid and project it onto the facets of the pyramid. The area that is most visible is projected onto the bottom of the pyramid, which has the largest size among the facets. And the rest areas which are either on the side or behind the user are projected on the four triangles. Pixels on the area that is opposite to the viewing angle is projected on the tips of the 4 triangles, so they will receive less attention when being compressed as a video.

The apparent trade-off is that if the user suddenly turns into a different direction, he/she will experience a slightly blurry image, but this will soon be corrected by adjusting the viewing angle of the pyramid projection algorithm.

In reality, instead of adjusting the viewing angle of the pyramid projection method, they pre-generate a set of videos for several angles and jump among those videos when the viewing angle changes.

Now, I want to talk about the method my friend did to optimize pixel shader. When you render a real-time scene for VR, the pixels that are apart from the viewing angle should be calculated less. So he projected the curved surface in front of one of your eyes onto a homogenous coordinates, this is essentially the same idea as cube map.

--

--

Shi Yan

Software engineer & wantrepreneur. Interested in computer graphics, bitcoin and deep learning.