Mantis Shrimp: Image Differences with Metal shaders

David Gavilan
Real Time Rendering
8 min readFeb 26, 2024
Mantis Shrimp image diff tool for macOS

Image Diffs and Mantis Shrimp

An image diff is an image that visualizes the difference between 2 other images in some manner. There are many image diff tools around, but I often find myself wanting to write my custom difference operator, depending on the thing I’m looking for in the image.

A mentor I had created once an internal tool in Javascript to do image diffs where you could write a snippet of Javascript. It was very useful, but the code ran in the CPU for each pixel, so it was quite slow. Also, it could only deal with the types of images that the browser could handle, that is, usually just 8-bit images in sRGB color space. He called this web app Mantis Shrimp, one of his favorite animals. The reason: mantis shrimps have up to 16 different type of photoreceptor cells. In comparison, humans have just 3 different type of cells (although some people have tetrachromacy). But who needs that many types of photoreceptors when we have technology and software to enhance what we see?

I borrowed that awesome name for my Mantis Shrimp app, although this app can do more than the original. It can compute image diffs of any 2 images that macOS supports, that is, up to 32 bits per color channel, and different types of color spaces. It does it in real time because the operations happen in the GPU, so pixel operations are done in parallel, not sequentially. The app comes with different preset operators, but you can write your own with Metal shaders as well.

Because you can write shaders with it, you can do much more than just image differences. You can even create animations with it, pretty much like what Shader Toy does in the WebGL world.

Here’s a 30-second video summary of what Mantis Shrimp can do: https://youtu.be/ijJRaahCF0c?si=w7anoRsiH0nb-tVi

In this article I’m going to give you some details about the actual implementation of Mantis Shrimp.

SwiftUI and Metal

In WWDC2023 Apple announced new functions to modify a Swift view with custom shaders: colorEffect, layerEffect, and distortionEffect. Distortions modify the location of each pixel, whereas the other two modify its location. I assume they must be fragment/pixel shaders. You can find some nice examples in How to add Metal shaders to SwiftUI views using layer effects — a free SwiftUI by Example tutorial.

However, if you want to do something more complex than that and you plan to use SwiftUI, you will need to create a custom UIViewRepresentable. You can find an example of this in the Apple forums: MetalKit in SwiftUI.

For Mantis Shrimp I followed that route, and I encapsulated all the rendering using the Renderer class of my VidEngine, an open-source graphics engine I created a couple of years back. VidEngine uses Swift and Metal, but at the time of writing I haven't released the changes to make it work with macOS and SwiftUI.

Mantis Shrimp render passes

In VidEngine a “render graph” is simply an ordered list of what I call “plugins”. A plugin encapsulates one or more render passes. A real render graph should be a Directed Acyclic Graph where the nodes are render passes and each node is connected to others through read and write dependencies between the resources they use. One of the best explanations I found of render graphs is in this blog: Rendergraphs and how to implement one.

Because there are only 3 plugins in Mantis Shrimp, the dependencies are hard-coded. One of the plugins is for Mesh Shaders, that I will discuss in a separate article. Most of the time that plugin is disabled. The other two plugins are the DiffPlugin and the OutPlugin.

The DiffPlugin is where the actual operation happens. It consists of a simple vertex shader that draws a full-screen rectangle, and a fragment shader with the per-pixel operation. This fragment shader can be substituted by your own code. Apart from the texture coordinates of each pixel, I pass some other variables such as the time in seconds, so you can create animations. Read the manual for details.

The DiffPlugin writes the output to an image that it's the same size, the same bit-depth, and the same color space as the first input image. You can only export images as PNG at the moment, but it should preserve its size, its bit depth, and its color space.

What you see on screen, though, it’s what the OutPlugin shows you. Its input is the output of the DiffPlugin, and it adapts it to the current view. By default it uses point sampling, so if your image is a few pixels wide, you should see a pixelated image, not a blurred one (if it were linearly interpolated). This is important because in an image diff tool you want to see the details, not a blurred version of them! The view supports display-P3 color space by default, but the pixel format that gets selected may vary depending on the hardware.

The OutPlugin may also apply the final gamma where necessary. Some pixel formats support the sRGB flag for automatically applying the gamma (or inverse gamma when reading), but it's not for all pixel formats and its support varies depending on the hardware, so the operation needs to be done in a shader.

A diff fragment shader

A simple difference operator looks like this:

fragment half4 main() {
float4 a = texA.sample(sam, frag.uv);
float4 b = texB.sample(sam, frag.uv);
float4 diff = abs(a-b);
float4 out = float4(uni.scale * diff.rgb, a.a);
return half4(out);
}

The signature of the function is predefined, and "main" is just a shortcut I've defined in Mantis Shrimp, because the function signature can't be overridden. The actual signature looks like this:

fragment half4 diffFragment(
VertexInOut frag [[stage_in]],
texture2d texA [[ texture(0) ]],
texture2d texB [[ texture(1) ]],
sampler sam [[ sampler(0) ]],
constant Uniforms& uni [[ buffer(0) ]])

So apart from the fragment texture coordinates, you get two textures, a texture sampler, and some extra variables. The operation above is just subtracting the RGB values of both textures and setting the output to be the absolute value of the difference. Here’s a summary of the different diff presets in Mantis Shrimp:

Mantis Shrimp image diff presets

When no image is assigned, a white texture is sampled by default. That means that the default RGB diff operator acts as a negative if you only assign one image. See the example below.

Negative painting from Iranian-British artist Soheila Sokhanvari. By default Mantis Shrimp will negate the input.

A shader sandbox

Mantis Shrimp can also be used to simply test shaders. People familiar with Shader Toy or TwiGL will know that you can create beautiful animations with just a fragment shader.

A common mathematical tool for that purpose is the use of Signed Distance Functions (SDF). An SDF is a function that tells you how far a point is from the surface of the object. When you are inside the object, the distance is negative, hence the “signed”. Because in a fragment shader you get the (u,v) texture coordinate of the output, you can use an SDF to draw simple 2D figures. For instance, a circle centered at (0,0) is just the length of the (u,v) vector minus the radius of the circle.

If you apply transforms to the (u,v) coordinates, you can do more fancy things. One common transformation is to multiply the (u,v) coordinates by a number greater than one and then taking its fractional part, the decimals. In this manner, you will have repeating coordinates that go from 0 to 1, and then from 0 to 1 again. If you use the time variable to change these transforms across time, you can create some interesting animations. Mantis Shrimp comes with this SDF animation preset to get you started:

float sdCircle(float2 p, float r) {
return length(p) - r;
}
float2x2 rotationMatrix(float angle) {
float s=sin(angle), c=cos(angle);
return float2x2( float2(c, -s), float2(s, c) );
}
fragment half4 main() {
float t = uni.time;
float aspect = uni.resolution.x / uni.resolution.y;
float2 uv0 = frag.uv * 2 - 1;
uv0.x *= aspect;
float2x2 r = rotationMatrix(cos(t));
uv0 = r * uv0;
float2 uv = fract(2 * uv0) - 0.5;
float d = sdCircle(uv, 0.5) * exp(-length(uv0));
float s = uni.scale + 1;
d = sin(d*s + t) / s;
d = 0.01 / abs(d);
float2 uvImage = 0.5 * float2(sin(t) + 1, cos(t) + 1);
float4 color = texA.sample(sam, uvImage);
float4 out = float4(d * color.rgb, 1);
return half4(out);
}

The output looks like this:

See video here: 2023–12–14-mantisshrimp-circles.mp4

SDFs can also be used to represent 3D surfaces. The SDF for a sphere is the same as a circle, but we use the length of an (x,y,z) coordinate instead of a 2D coordinate. Usually these 3D SDFs are combined with a technique called Ray Marching, which consists of casting a ray for every (u,v) coordinate in the screen, starting in the near plane of the camera frustum, and advance the ray along depth based on the value of the SDF. Remember that the SDF tells you the distance to the surface, so you basically know how far you need to move.

There are plenty of resources online to learn about this. Check out Iñigo Quilez home page. He’s the creator of Shader Toy and he has many interesting resources. The important thing for this article is to highlight that you can use Metal shaders in Mantis Shrimp to create this kind of animations (or “demos”) as well. See these ray-marched cubes (not marching cubes!):

See video here: 2024–01–29-Genuary-SDF-ray-marching.mp4

Here’s the shader code for the cubes example: Genuary 29-sdf-raymarching.metal. You can find other examples I did for the #Genuary challenge in that folder: endavid/Genuary2024.

Beyond fragment shaders

Apart from the fact that the original intent of this app was simply comparing 2 images, it felt strange to allow custom vertex shaders. What would be the point if you can’t change the geometry? I would need a way to upload geometry to Mantis Shrimp. But then, that would be a model viewer, rather than an image diff tool!

However, being able to play with shaders that do more generic things programmatically is still attractive. That’s why I added support for mesh shaders in version 1.1 of Mantis Shrimp. I will discuss this in the next article, but the basic idea is that you have a mesh shader with no geometry at all as input, and you create your own geometry programmatically in the shader. So you can create 3D graphics procedurally, without necessarily using ray marching and SDF functions in a fragment shader. Here’s an example of some cubes generated in a mesh shader: Genuary 10-cubes.metal.

See video here: 2024–01–10-Genuary-Hexagonal.mp4

If you use Mantis Shrimp & you like it, please leave me a comment in the App Store. And if you post creations in Twitter or Instagram, use the hashtag #MantisShrimpApp so I can find them 😊

Happy coding!

Originally published at http://endavid.com.

--

--

David Gavilan
Real Time Rendering

Ph.D. Graphics Engineer at Metail. Worked on several graphics engines in the past (Fox Engine, Disney Infinity, mobile VR, server-side rendering).