Bugs can be beautiful
Some features of GLFuzz, through the lens of ARM Mali GPUs
[Part of a series of stories on GPU shader compiler bugs.]
Creating a beautiful bug: live code injection
Check out this beautiful image:
Looks to me like a work of modern art. I like it more than this spacey image:
Yet the first image, with the beautiful stripes, is actually wrong: it arises due to a shader compiler bug present in the ARM Mali GPU drivers in our Samsung Galaxy S6 phone.
The bug is found by GLFuzz using live code injection. We’ve previously looked at dead code injection, which is where GLFuzz takes code from another shader and injects it into the target shader, enclosed in an “if(false) { … }” block so that it doesn’t get executed.
Live code injection is different. GLFuzz takes some code from another shader and injects it into a randomly chosen point in the target shader almost unmodified. The injected code is not enclosed in an “if(false) { … }” block, so it really does get executed.
This fragment shader is responsible for rendering the space scene in the left image above. Here’s the live code that GLFuzz injects:
float GLF_live13t = 1.0;
for(int GLF_live13i = 0; GLF_live13i < 1; GLF_live13i++) {
if(GLF_live13t > 1.0) {
continue;
}
GLF_live13t += GLF_live13map();
}
Notice that all the variables and functions referenced in this code block are prefixed “GLF_live13”. This renaming ensures that the variable names do not clash with any existing variables. I’ll provide more details of this in a parallel post soon.
Because the live code calls a function, GLF_live13map, this function needs to be pulled into the target shader too. Here it is:
float GLF_live13map() {
float GLF_live13d = 1.0;
for(int GLF_live13i = 0; GLF_live13i < 1; GLF_live13i ++) {
GLF_live13d = length(
vec3(float(GLF_live13i), 1.0, 1.0) -
vec3(0.0, 0.3 + GLF_live13time, 0.0) -
vec3(GLF_live13d, 1.0, 1.0));
}
return GLF_live13d;
}
We’ve reduced the injected live code as much as possible —simplifying it any further makes the bug disappear.
If you have a Samsung Galaxy S6, you might be able to reproduce the issue.
This web page that renders the original shader:
And here is a page that renders the variant shader with the live code injection:
Note: on some iPhones we’ve found that both shaders render a bright pink space scene! Let us know what you find.
It is hopefully clear that the injected code should have no impact on what the shader renders, since all referenced variables are also declared inside the injected block, and side effects from the block cannot leak to the rest of the shader.
As usual, details of this bug are available through our GitHub GLSL issues page.
We reported this issue to ARM, who responded: “We’ve confirmed that we can reproduce this one in all driver versions up to and including our r11p0 driver release from October last year. We’ve been unable to reproduce in any newer drivers (r12p0 onwards), so there is a bug-fixed production driver available to OEM device manufacturers to take if they are willing to push a driver update.”
ARM also told us that the bug was related to register allocation: “It looks like there was indeed an obscure register allocation bug in the driver from 2015, which has been fixed in later versions of our DDK.”
Working around bugs using GLFuzz
GLFuzz aims to induce bugs by making semantics-preserving changes to a shader. But we’ve encountered cases where GLFuzz actually suppresses bugs: our transformations sometimes cause correct rendering where otherwise rendering is incorrect.
As an example, this shader leads to this image being rendered on an Intel HD Graphics 520 GPU:
We believe this is an accurate image corresponding to the shader — we see a visually identical image rendered using most other GPUs too.
But using an ARM Mali GPU in our Samsung Galaxy S6 phone, and also using a Samsung Chromebook, we get this image using Chrome with WebGL:
So it looks like the ARM Mali driver in these devices is mis-compiling the fragment shader.
GLFuzz tries changing this statement:
return - log(res) / k;
to this:
return - log(res) / (injectionSwitch.x > injectionSwitch.y ? 1.0 : k);
The change should have no effect, because injectionSwitch is set to (0.0, 1.1), so the ternary expression evaluates to k.
Here is the variant shader with the above change, and here is the image we see when rendering using the variant:
It is practically identical, visually, to the images that we see rendered for the shader using our Intel GPU and several other GPUs. So it would seem that our semantics-preserving transformation acts as a work-around for the compiler bug!
Again, details of the bugs are available through our GitHub GLSL issues page.
We reported this issue to ARM, who confirmed the issue: “We’ve managed to reproduce this one internally on the latest driver release for the Midgard GPU family (Mali-T* GPUs), but we don’t have an issue on the newer Bifrost GPU family (Mali-G* GPUs). We’ll try and get a fix into the next Midgard driver release (r16p0 is the earliest possible intersection point), but exactly when that bubbles into consumer devices is out of our control unfortunately.”
See whether you can reproduce the issue. Here’s a page that renders the original shader (the one we found to give the wrong image):
Note: on some iPhones we’ve found that both shaders render black images. Again, let us know what you find.
Next stop: Imagination Technologies.