Custom Navigation Transitions with Metal
As someone who likes graphics programming and has been studying Metal, I see that a lot of times people don’t find a reason to learn or where to apply the Metal API in their apps. Usually this comes from a lack of knowledge of what the tool can provide, but also with the lack of information available online to learn (usually only Apple docs). So here, we will learn how to use Metal in an app by creating custom Navigation Transitions.
Just to refresh what Metal is:
Metal is a low-level API for GPGPU (General Purpose Graphics Processing Unit) programming.
But what does this mean? In a simplified manner, it means that we can create programs to use the GPU of our device, usually for graphical applications, but also for computations in general (AI for instance).
The main advantage of the GPU is the parallel processing. Different from the CPU, the GPU is made for processing data in a very fast manner, or how do you think that your screen can refresh at 60 (or more) frames per second with so many pixels? But if so, why don’t we use only the GPU? The reason is that the GPU is great for parallel computations, which means that all the math being done is independent and can be worked at the same time, different from a database usage for instance, where we want to follow some order and have control over the process of the data and for that, the CPU is better fitted.
But anyway, let’s start to talk about the project that we will do and start to see some code.
So, the idea of this post is to teach how to create custom UINavigationController transitions. We will create a custom animation when we do a push to show a new UIViewController, which is something that anyone that has used an iOS device is very familiar with. Even so you probably now, let’s refresh how the push animation looks like:
Nothing new, right?
For most applications and in most use cases, this animation is perfectly fine and there is no need for creating something new or different. But sometimes we want to create something more striking to tell the user what is happening. Perhaps your app is highly customized and doesn’t necessarily follow too much the Apple standard for UI. Or the designer can have some custom transition for a specific part of the app and you can’t do it with what UIKit provides.
From now on, we will do this transition, but in an app:
That ‘s cool! Very unexpected and stylish. For those who never played Doom, this is known as the Doom’s Screen Melt. We will do this transition just because it looks exciting and different, but Metal has almost no limit for what you can do (yeah, you could do a 3D transition if you wanted it).
UINavigationControllerDelegate
First, we will implement the UINavigationControllerDelegate:
extension FirstViewController: UINavigationControllerDelegate {
func navigationController(
_ navigationController: UINavigationController,
animationControllerFor operation: UINavigationController.Operation,
from fromVC: UIViewController,
to toVC: UIViewController
) -> UIViewControllerAnimatedTransitioning? {
return DoomTransition()
}
}
This is nothing different than the usual delegates from UIKit. Here is where we will return the object that will implement the transition, which needs to be a UIViewControllerAnimatedTransitioning. Now, let’s take a deeper look in the DoomTransition class.
DoomTransition
The DoomTransition class is where we will configure and create the code to execute the transition.
class DoomTransition: NSObject, UIViewControllerAnimatedTransitioning {
let duration: TimeInterval = 2
let view = MetalView(device: device)
let queue = DispatchQueue.main
func transitionDuration(
using transitionContext: UIViewControllerContextTransitioning?
) -> TimeInterval {
duration
}
func animateTransition(
using transitionContext: UIViewControllerContextTransitioning
) {
guard
let from = transitionContext.viewController(forKey: .from),
let to = transitionContext.viewController(forKey: .to)
else { return }
let container = transitionContext.containerView
let frame = container.frame
view.frame = CGRect(x: 0, y: 0, width: frame.width, height: frame.height)
container.addSubview(to.view)
container.addSubview(view)
view.fromTexture = from.view.snapshot()
view.toTexture = to.view.snapshot()
queue.asyncAfter(deadline: .now() + duration) {
self.view.removeFromSuperview()
transitionContext.completeTransition(
!transitionContext.transitionWasCancelled
)
}
}
}
It inherits from NSObject to be able to conform to the protocol UIViewControllerAnimatedTransitioning. We have two properties: the duration of the transition and a custom view that will run the animation. This custom view receives a device parameter, which is a MTLDevice and it is a reference to the GPU. We should have only one in the application and can be created through the function MTLCreateSystemDefaultDevice.
The first method, transitionDuration, as the name says, returns how much time it will take the transition. The animateTransition method is where we will use our custom view to execute the animation.
We get a reference for both the UIViewController that we are coming from and as the one we are going to. We set the frame of the custom view that will execute the animation (yes, the frame, we can’t use constraints here), and we add both the next view as the animation view. The order here matters, since we want the custom view to be on top to execute the animation.
Snapshot
After setting the views (so far nothing too complex), we get a snapshot for the from and one from the to view. Why is that? Because Metal can’t change the UIViews themselves, but we can animate two images. If you think that this is not optimal, it is very common to use images (textures) in 3D applications and Metal is made for dealing with them. Also, the animation needs two frames (or textures in this case) to know which pixel will show (from the current view or from the next view).
extension UIView {
func snapshot() -> MTLTexture? {
let width = Int(bounds.width)
let height = Int(bounds.height)
let context = CGContext(
data: nil,
width: width,
height: height,
bitsPerComponent: 8,
bytesPerRow: 0,
space: CGColorSpaceCreateDeviceRGB(),
bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue
)
if let context, let data = context.data {
layer.render(in: context)
let descriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .rgba8Unorm,
width: width,
height: height,
mipmapped: false
)
descriptor.usage = [.shaderRead, .shaderWrite]
if let texture = device.makeTexture(descriptor: descriptor) {
texture.replace(
region: MTLRegionMake2D(0, 0, width, height),
mipmapLevel: 0,
withBytes: data,
bytesPerRow: context.bytesPerRow
)
return texture
}
}
return nil
}
}
The snapshot code is very simple even though it seems like a lot. We “create an empty image” of the size of our view (the whole screen) and the Core Animation layer updates the “empty image” with the contents of the screen (or a snapshot). After that, we create a MTLTextureDescriptor so that Metal can create an empty texture and we replace its contents with the ones from the snapshot that we created.
And now, to finish our animateTransition method, we schedule a block to be executed after the duration of the animation. This block removes the custom view and tells that the transition has completed. Here, we use asyncAfter because we don’t have a standard way to schedule like UIView.animate provides.
MetalView
So far, the code has been very straightforward and has nothing special. But we didn’t talk about the MetalView, a custom View that does all the heavy lifting necessary to render this animation. Let’s take a look at it:
final class MetalView: MTKView {
let positions = [
SIMD3<Float>(-1, 1, 0), // Top Left
SIMD3<Float>(-1, -1, 0), // Bottom Left
SIMD3<Float>( 1, -1, 0), // Bottom Right
SIMD3<Float>( 1, 1, 0), // Top Right
SIMD3<Float>(-1, 1, 0), // Top Left
SIMD3<Float>( 1, -1, 0), // Bottom Right
]
let textureCoordinates = [
SIMD2<Float>(0, 0), // Top Left
SIMD2<Float>(0, 1), // Bottom Left
SIMD2<Float>(1, 1), // Bottom Right
SIMD2<Float>(1, 0), // Top Right
SIMD2<Float>(0, 0), // Top Left
SIMD2<Float>(1, 1), // Bottom Right
]
var renderPipelineState: MTLRenderPipelineState!
var commandQueue: MTLCommandQueue!
var sampler: MTLSamplerState!
var fromTexture: MTLTexture?
var toTexture: MTLTexture?
var timePassed: Float = 0
override init(frame: CGRect = .zero, device: MTLDevice?) {
super.init(frame: frame, device: device)
setup()
}
}
This is a very big class, but we can break it down in smaller parts and most of it is configuration and setup rather than actually rendering code.
Coordinates
First let’s take a look at some two properties that we have: positions and textureCoordinates. Metal uses a normalized coordinate system, where positioning follows the same logic of the following image independent of screen size or device orientation:
And the texture coordinates are coordinates to identity positions in a texture. It follows a similar logic like UIKit, but also normalized:
With that, we can understand the values for these properties.
The SIMD3 and SIMD2 are a vector with 3 properties (x, y, z) and another with 2 properties (x, y) that is used in Metal (there is more about it, but it is outside the scope of this article). We have 6 values for each, because the most complex geometric form that Metal renders is a triangle (the simplest geometric figure that we can use to build all others). So we divide the screen in 2 right triangles with a line that cuts from the diagonal:
So we have the first three values of both properties to identify the coordinates of the triangle at the bottom and the last three values for the triangle at the top.
Metal Objects
We will have a MTLRenderPipelineState object that is also known as PSO (Pipeline State Object) and it is the object that stores the shaders to be used and how to interpret the data that we will send to the GPU. If you don’t know about shaders, soon we will talk more about it.
Then we have a MTLCommandQueue, which the name tells much about. It is a queue that we will use to tell the commands that we want for rendering in the current frame.
There is also the MTLSamplerState that describes how we will sample (interpret) the pixels in a given texture. There are a lot of configurations for it, but in this case we will use the default values.
These three last properties are very simple: two textures for the current view presented and the view that will be presented; and how much time has passed since the start of the animation (this is needed to calculate the position of pixels).
For the first setup, we will create the command queue and the sampler.
func setup() {
commandQueue = device!.makeCommandQueue()
sampler = device!.makeSamplerState(descriptor: MTLSamplerDescriptor())
createRenderPipelineState()
}
For the renderPipelineState, let’s use its own method since it requires more information:
func createRenderPipelineState() {
let library = device!.makeDefaultLibrary()!
let vertexFunction = library.makeFunction(name: "main_vertex")!
let fragmentFunction = library.makeFunction(name: "doom_melt")!
let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float3
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[1].format = .float2
vertexDescriptor.attributes[1].bufferIndex = 0
vertexDescriptor.attributes[1].offset = MemoryLayout<SIMD3<Float>>.size
vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride + MemoryLayout<SIMD2<Float>>.stride
let renderPipelineDescriptor = MTLRenderPipelineDescriptor()
renderPipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
renderPipelineDescriptor.vertexFunction = vertexFunction
renderPipelineDescriptor.fragmentFunction = fragmentFunction
renderPipelineDescriptor.vertexDescriptor = vertexDescriptor
renderPipelineState = try! device!.makeRenderPipelineState(descriptor: renderPipelineDescriptor)
}
Let’s walk through each part of this code.
First we get a MTLLibrary and two MTLFunction. The library is where we can get reference to the shaders and we get one for the vertex and other for the fragment shader. Until now we didn’t create them, but the vertex shader is responsible for positioning in the screen and the fragment shader for coloring each pixel. We will get more in depth about them soon.
Then we create a MTLVertexDescriptor, which describes the vertex which is the data that we send to the vertex shader. Usually it is the position of the objects on the screen, but can also have texture coordinates, some matrices and other data relevant to positioning. In this case it is the positions and texture coordinates only.
So the first attribute (in the 0th position) is a float3 (a vector with x, y and z) in the buffer index 0 with 0 of offset. This offset is the offset of how many bytes we will need to “jump” for the next attribute, in this case, zero. The second attribute is a float2 (vector with x and y), at the index 0 and offset of a SIMD3 of Float (the same as a float3). The floatX and SIMDX where X is the same number represents the same type, we could explain the details, but for this project it doesn’t make much difference.
After this, we set the memory layout for the object that we send, which is the stride of a SIMD3 plus the stride of a SIMD2. But what is a stride? The stride is the number of bytes for the next element in a contiguous memory. This explanation can be confusing, but imagine that each element is separated from each other with 32 bytes. Let’s assume that we only use 24. In memory, each element will still be using 32 bytes, but 8 of these will not be filled with any data, to conform the memory layout specified (we use exponents of 2 because of the way computers work).
Then, we will create an MTLRenderPipelineDescriptor, which will tell us how to use the data and which shaders. This is very trivial, except for the colorAttachments[0].pixelFormat = .bgra8Unorm. This is the default way to interpret pixel bytes and there isn’t much of an official answer, but a lot of Apple examples use it and we will stick to it.
And finally, we use the device to make a new MTLRenderPipelineState. Here we don’t focus on handling errors, but it is a good practice to deal with them accordingly.
draw
Now we will move forward to the render loop before talking about the shaders:
override func draw(_ rect: CGRect) {
guard
let drawable = currentDrawable,
let renderPassDescriptor = currentRenderPassDescriptor
else { return }
timePassed += 1 / Float(preferredFramesPerSecond)
let commandBuffer = commandQueue.makeCommandBuffer()
let commandEncoder = commandBuffer?.makeRenderCommandEncoder(descriptor: renderPassDescriptor)
commandEncoder?.setRenderPipelineState(renderPipelineState)
commandEncoder?.setVertexBytes(
positions,
length: MemoryLayout<SIMD3<Float>>.stride * positions.count,
index: 0
)
commandEncoder?.setVertexBytes(
textureCoordinates,
length: MemoryLayout<SIMD2<Float>>.stride * textureCoordinates.count,
index: 1
)
if let fromTexture, let toTexture {
commandEncoder?.setFragmentTexture(fromTexture, index: 0)
commandEncoder?.setFragmentTexture(toTexture, index: 1)
commandEncoder?.setFragmentSamplerState(sampler, index: 0)
commandEncoder?.setFragmentBytes(
&timePassed,
length: MemoryLayout<Float>.stride,
index: 0
)
}
commandEncoder?.drawPrimitives(
type: .triangle,
vertexStart: 0,
vertexCount: positions.count
)
commandEncoder?.endEncoding()
commandBuffer?.present(drawable)
commandBuffer?.commit()
}
The draw method is the method to actually render things on the screen and usually is called at the same number of times as the refresh screen (60, 120, etc.). If the work being done is too heavy, it is possible that some frames will take longer and it will break the rendering, so in more complex workloads it is important to be aware of the performance of the code written. For our case, we don’t need to worry about that.
We start by getting a reference for the object in which we will draw into and the current render pass. We can have multiple render passes in which each will add something to the screen, so we could have one for basic rendering, another one for shadows and so forth. We don’t need multiple, so we only use the current one created.
Then we calculate how much time has passed between the first frame to the current. The preferredFramesPerSecond is how many frames the display should render.
We create a MTLCommandBuffer and a MTLRenderCommandEncoder. The first is a buffer of commands that will be sent to the Command Queue and the encoder is where we set all information necessary (the pipeline state, data, etc.). So it is very simple to see that we are only calling a bunch of set methods to tell how to interpret the data and the data that we will be sending.
Before we finish with the encoder, we tell how it will draw the data, in this case as triangles (there is also point and line), since we use two triangles to create the rectangle that fills the screen. We say that it starts at vertex 0 and goes to the amount of positions that we have.
We say that we finish with this encoder and in other cases we could use a different one for another task, but we need only this one.
Finally, we say to the buffer render its contents to the drawable and commit the commands to be executed.
I know, it has been a lot of code, but we are almost finished.
Lastly, we will talk about shaders. So far we talked about shaders but didn’t explain and for those who don’t know, it can be somewhat confusing. But shaders are small programs (or functions) that we create to use the GPU.
Shaders
We have three main shaders: kernel, vertex and fragment. The kernel is used for computations and not necessarily for rendering, so we will not talk about it. The vertex shader is responsible for positioning the elements we want to render on the screen and the fragment is responsible for saying the color of each pixel of the screen. It is important to note that all this is runned in parallel using a lot of cores that the GPU has (some GPUs have thousands of cores!), so don’t need to worry if there are too many pixels to be calculated.
Before diving into the code, it is important to tell you that the shaders are written in a language called Metal Shader Language, a variation of C++14. Don’t worry if you don’t know C++, this should be very easy even if you don’t know. Also, we can create shaders as strings in our code or use the Metal File that Xcode creates to write it. Using the Metal File makes the shaders to be compiled alongside the rest of the code instead of on the fly. In this example we use one of these files.
VertexShader
Let’s start with the vertex shader that is very simple:
struct VertexOut {
float4 position [[ position ]];
float2 textureCoordinate;
};
vertex VertexOut main_vertex(constant float3 *positions [[ buffer(0) ]],
constant float2 *textureCoordinates [[ buffer(1) ]],
uint vertexID [[ vertex_id ]]) {
VertexOut out {
.position = float4(positions[vertexID], 1),
.textureCoordinate = textureCoordinates[vertexID],
};
return out;
}
The vertex function starts with the vertex keyword, to tell that this is a vertex function and not any other function (we can create helper functions, etc.). We have the return value to be our custom struct called VertexOut, which has the values that we will return both to the GPU so that it knows how to position the objects (marked with the position annotation) but also for feeding it to the fragment shader soon.
It is important to note that the value for position in the vertex shader has to be a float4 (x, y, z and w), even though we aren’t using all the coordinates.
We have three parameters, both the positions and the textureCoordinates, that we use the annotations to tell to retrieve from the buffers at the specified indices and we have an vertex_id, which is the index of the current vertex that we are accessing (remember that we are sending 6 values for each array). It is important to note that you should not assume anything about the ordering or the values being accessed, since it is not guaranteed that it will always be the same.
Then, we just create a new VertexOut and set the values for the attributes, and return the object. The position annotation in the position property will already be recognized by the Metal API as where to place the points and how to connect them (each three points makes a triangle).
Fragment Shader
Now, we go to the last piece of code that is the fragment shader:
constant float START_SPEED = 2.7;
constant float MELT_SPEED = 1;
/// Adaptation of the Doom Effect shader from: https://www.shadertoy.com/view/XtlyDn
/// Created by k_kondrak in 2017-09-02
fragment float4 doom_melt(texture2d<float> from [[ texture(0) ]],
texture2d<float> to [[ texture(1) ]],
sampler sampler [[ sampler(0) ]],
VertexOut vertexIn [[ stage_in ]],
constant float &timePassed [[ buffer(0) ]]) {
float2 uv = vertexIn.textureCoordinate;
float velocity = START_SPEED * timePassed;
if (velocity > 1) velocity = 1;
uv.y -= velocity * 0.35 * fract(sin(dot(float2(uv.x, 0), float2(12.9898, 78.233))) * 43758.5453);
if (velocity == 1) uv.y -= MELT_SPEED * (timePassed - velocity / START_SPEED);
if (uv.y < 0) {
return to.sample(sampler, vertexIn.textureCoordinate);
}
return from.sample(sampler, uv);
}
Before we dive into the code, this is an adaptation from the fragment shader from Shader Toy made by k_kondrak.
First we have two constants that are the START_SPEED for the animation and the MELT_SPEED, which are very clear by their name. You can tweak and play with these values.
Then we have the declaration of the fragment shader which returns a float4 (this could be other values but not relevant for this example) to tell each value of a pixel (Red, Green, Blue and Alpha).
We have a lot of parameters. The first two are the snapshot textures from earlier and we have the sampler to interpret each pixel from it. We have a VertexOut object that we received from the vertex shader (using the stage_in annotation) and the time passed since the beginning of the animation.
We create a copy of the texture coordinate calling it uv, which is a common pattern for x and y axis for textures coordinates. We calculate the current speed and if it is above 1 (remember that we use normalized values?) we set it to 1.
Then we have this long and fancy line. It can be scary, but it is simply using the speed with some values to randomize the column order. We “randomize” the start point for each column. This order will be maintained for all the animation, so the “randomization” only happens once. Also, I am calling it “randomization” because it is not really random, just pseudo-random. You can play with the values and even replace the math for another one if you want.
If we are still animating the columns (if there is a velocity) we calculate where the next pixel should be displayed.
If the y value is below zero, we should not display the pixel for the current screen but instead the pixel for the next screen. Else, we display the current pixel at the calculated position.
Phew! That was a lot! But now, if we run our application, we should see the animation working properly.
Some things to note:
- The usage of colored screens is just to create an easy to see contrast between the two screens, it is not a requirement and it can be applied to any app.
- This code should work in all Apple devices with some adaptations (the usage of Cocoa classes for macOS, etc.).
- Even though it is a lot of code, in reality most of it will not change for a lot of animations like this one, mostly changing the fragment shader.
Here is the repository with the source code of this project.
Hope that you use Metal in your apps in creative and innovative ways!
Thanks if you read until this point.