Physically Based Shading on Mobile

Published in

spaceapetech

8 min readOct 30, 2018

Introduction

About a year or so ago we noticed more and more that our art team weren’t happy with how our games were looking on device. They wanted, and we wanted, to push our game visuals above and beyond anything we’d previously been capable of doing. So, over the past year or so we developed an Uber Shader, using the Unity shader variants toolchain.

The Uber Shader was intended to be a one stop shop for every surface we have in our games. It’s a PBR inspired shader, in that it doesn't follow the PBR rule set, but used it for guidance. For those that haven’t come across PBR yet, I can recommend Trent Reeds post, Physically Based Shading and Image Based Lighting. As it was adopted throughout the studio, more and more features were requested by the art team and the shader grew into what we know it as today. It supports a whole host of features, including, but not limited to:

Detail maps
Texture blending
Reflections
Pseudo sub surface scattering
Fresnel
Normal maps
Metallic surfaces
Linear & gamma space workflows

Issues

Because there are so many shader variants, compile times can be astronomical. When you include all of our variants, shader LODs and shader passes, shader compile times can be upwards of 30 minutes. This is a killer for rapid iteration time.

We have a custom shader inspector. The inspector enables and disables the shader features when certain textures are set, or when toggles are enabled. We noticed that the inspector was getting overly complex and our initial thought was to modularize it, separating features out so that the artists can find things easier. But it quickly became apparent that things had gotten a bit too complicated.

We gave the art team so much flexibility that it became almost impossible for them to balance materials in game. They were spending too much time creating and configuring materials. What’s more, our materials look completely different throughout our toolchain (Maya, Substance Painter and Unity) meaning that the artists had to configure surfaces three times, with vastly different settings.

Substance Painter

Our art team use, and love, Substance Painter. For those that don’t know, it allows you to create and blend materials in a very visual and artistic manner. It’s used by hundreds of studios around the world. They also love the way in which their materials look inside Painter. Investigating a bit further we immediately noticed that Substance has a PBR pipeline. Ideally, the materials the art team produce in Substance should look identical, or as close to identical as possible, in Unity, on mobile devices.

Research

There’s a lot of information out there when it comes to physically based rendering and shading. It’s been widely adopted over the past 5 or so years. I set about reading papers, watching talks and clicking through slides. Collating as much information as humanly possible. The first thing that came to light was that almost all of the PBR information and research out there we designed for desktop and console hardware. However, we want to be able to achieve the same visuals on mobile. We had our work cut out! As a side note, I’ve added a lot of links to the research I found at the bottom of this post.

Decisions

Disney PBR Principles

PBR was first brought into the limelight when Disney published their paper titled Physically-Based Shading at Disney. They talk about the way they transitioned to PBR when creating Wreck-It Ralph in 2012. As well as defining their lighting models, they also defined a number of rules which you should use to create and establish a PBR pipeline:

Intuitive rather than physical parameters should be used.
There should be as few parameters as possible.
Parameters should be zero to one over their plausible range.
Parameters should be allowed to push beyond where it makes sense.
All combinations of parameters should be plausible.

Workflow

There are two main workflows that people follow when developing PBR pipelines:

Specular
Comprises of 7 channels:
Albedo (RGB)
Specular (RGB)
Smoothness [0, 1]

Metallic
Comprises of 5 channels:
Base Colour (RGB)
Roughness [0, 1]
Metallic [0, 1]

For dielectric materials, the specular highlight colour is a greyscale value. So using the specular model you’re wasting two channels in the specular texture. For metallic surfaces, the diffuse colour is almost black, so again in the specular model your wasting 3 channels in the albedo texture. Considering this, the fact that were on mobile and the fact that Disney's 2nd principle states that “There should be as few parameters as possible” we chose to go exclusively with the metallic workflow. Bungee (Destiny 2), Unreal Engine, Frostbite Engine and Unity all support the metallic workflow (Unity also supports the specular workflow).

ALU vs LUT

Our Uber Shader toolchain bakes specular and diffuse lighting results into a look-up texture. We then sample this texture using N.L for diffuse and [N.H, smoothness] for the specular highlight. This is really cheap and lets us use many expensive lighting models. However we noticed that the highlights would sometimes not be sharp enough, we lost some data due to texture and data compression. We also know that going forward, supporting multiple lights isn’t feasible. If we have four lights affecting a single surface, that would result in eight texture fetches from the lighting alone.

Lighting Model

The most famous specular lighting model is quite possibly the Blinn model. It’s been around for decades. Moving to PBR people have adopted a normalized Blinn Phong lighting model. But we wanted to do better. Most console and desktop titles today go with the GGX specular model, it looks a lot more realistic when compared with a real world specular highlight. Unfortunately, this realism comes with a lot of extra cost! I did a lot of research and came across a paper published by Unity at SIGGRAPH 2015 titled Optimizing PBR. It’s a great read and sets out the formula we adopted, it’s a very close approximation that actually runs on mobile devices.

Implementation

Here is a (simplified) outline of the fragment shader we developed for the ForwardBase lighting pass. All of our input colours and colour based textures are gamma corrected, so we are working in linear space. In the end, we write the final colour back to the frame buffer in gamma space.

We store smoothness, metallic and ambient occlusion parameters in a linear texture. We also store an emissive colour mask and a texture blend mask in a second texture, though we may look to squeeze this all into one texture at the expense of some ALU unpacking overhead. We prefetch as much texture data as possible inside the sampleMaterialTexture() and initSurface() function calls. We also pre calculate a lot of reusable surface information and store it in the SurfaceData structure, things like:

Roughness
Normals
Diffuse & Specular terms
Reflection probe samples

Our BRDF is broken down into quite a few helper functions, there’s really to much to share, but here is the main part of it:

We combine our environmental lighting like so. We prefetch our cubemap data at the beginning of our shader (in the initSurface() function call) to prevent stalls — you can read about this a little further down.

Profiling

It’s definitely worth profiling your shaders, the best tool for this is XCode’s GPU Debugger — it shows you line by line costs of your shader. We had a stall in the PBR shader that was consuming 40% of the entire shader time! This stall came from sampling the environment cube map, and then decoding the value immediately. Sampling the cube map at the beginning of our shader, then decoding it much later in the shader allowed us to take that stall down to just 3% of the total shader time.

XCode GPU Debugger — costs per line of shader code

Conclusion

We were able to develop a prototype that looks very similar across our tool chain and runs smoothly on our target platforms (Android and iOS).

Left to right: Substance Painter, Unity Windows, Unity iOS. The biggest discrepancy between iOS and the other platforms comes from the LDR reflection probes.

Performance: ALU vs Texture Fetch

We wanted to prove that an ALU lighting approach made more sense for us, especially when adding more and more lights to the scene. The table below shows ALU instruction count and texture fetch counts for each of our 3 implementations, for one, two and four lights.

We can clearly see that the ALU based approach uses more ALU instructions and fewer texture fetches. This is more appropriate for our target platform. Bandwidth is super expensive on mobile! Over the years mobile GPUs have gotten more and more powerful, they are able to process lots of complex maths, however adding additional texture reads to a shader can cause the GPU to stall quite significantly.

Performance: ALU vs sRGB Gamma to Linear Conversion

Mobile devices supporting Open GL ES 3.0 and above provide support for sRGB texture reads. The GPU texture sampler is able to convert texture data from gamma to linear space. For earlier devices we have to convert between gamma and linear space manually:

Before committing to either we need to find out the cost of each. Using XCode GPU Debugger I found that, rendering a whole bunch of spheres, the ALU conversion took 3.05ms, whereas the sRGB method took 2.79ms. If you can drop support for OpenGL ES 2.0, or maybe just work in gamma space on these devices 😢, then adopt the sRGB method.

Next Steps

So far all of this work has come from our own trials and research. We’re still in the prototype stage and were finalizing our final feature set that we will take forward into the final production pipeline. We’ve made the shader code as modular and flexible as possible. Transitioning to scriptable render pipelines and ultimately deferred shading is our target.