Apple's Metal API tutorial (part 6 — Camera movement)

10 min readFeb 27, 2024

Hello everyone! In the last tutorial, we added some lighting to our cube to make it look more realistically. Today, we are going to look at how we can move around the world using mouse and keyboard and add some detail to the cube using light maps.

Camera class

Let's first encapsulate our camera into a separate class to make things easier. I will create a Camera.swift file that will contain the code for camera.

Camera.swift:

import Foundation
import simd

class Camera {
    var position = simd_float3(4.5, 5.0, 0.0)
    var direction = simd_float3(0.0, 0.0, 1.0)
    
    func getViewMatrix() -> simd_float4x4 {
        return createViewMatrix(eyePosition: position, targetPosition: position + direction, upVec: simd_float3(0.0, 1.0, 0.0))
    }
}

The file is very simple, its just a small abstraction layer. Note that we use camera direction instead of target position for convenience.

We will also have to change the renderer.

Renderer.swift:

...

class Renderer: NSObject, MTKViewDelegate {
    ...
    
    //Camera
    var camera = Camera()
    
    ...

    func draw(in view: MTKView) {
        camera.position.x = Float(5.0 * sin(Date().timeIntervalSince1970))
        camera.position.z = Float(5.0 * cos(Date().timeIntervalSince1970))
        
        camera.direction = --normalize(camera.position)
        
        ...
        
        //Create the view matrix
        var viewMatrix = camera.getViewMatrix()
        renderEncoder.setVertexBytes(&viewMatrix, length: MemoryLayout.stride(ofValue: viewMatrix), index: 1)
        
        //Upload view position
        renderEncoder.setFragmentBytes(&camera.position, length: MemoryLayout.stride(ofValue: camera.position), index: 0)
        
        ...
    }

    ...
}

We set the direction to the opposite of camera position so that the camera “looks” towards (0, 0, 0).

If you run the code, there should be no changes.

User inputs

Getting user inputs in swift is fairly easy thanks to the GameController framework. It provides ways for getting controller, keyboard and mouse inputs, as well something called “Virtual controller”, which we will look at when we will be implementing support for iOS.

Keyboard inputs

Let's start right away by printing a message when the user pressed WASD keys.

GameViewController.swift:

...
import GameController

// Our macOS specific view controller
class GameViewController: NSViewController {
    ...

    override func viewDidLoad() {
        ...
        
        // Input
        NotificationCenter.default.addObserver(forName: NSNotification.Name.GCKeyboardDidConnect, object: nil, queue: nil) {
            (note) in
            guard let _keyboard = note.object as? GCKeyboard else {
                return
            }
            
            // Register callbacks
            _keyboard.keyboardInput?.keyChangedHandler = {
                (keyboardInput, controllerButton, key, isPressed) in
                if keyboardInput.button(forKeyCode: .keyW)!.value > 0.5 {
                    print("Key W pressed")
                }
                if keyboardInput.button(forKeyCode: .keyA)!.value > 0.5 {
                    print("Key A pressed")
                }
                if keyboardInput.button(forKeyCode: .keyS)!.value > 0.5 {
                    print("Key S pressed")
                }
                if keyboardInput.button(forKeyCode: .keyD)!.value > 0.5 {
                    print("Key D pressed")
                }
            }
        }
    }
}

We first add callback to get notified when a keyboard connects (its called when a keyboard is already connected as well). We then add callback to get notified when a key gets pressed. Note that value of the key is a Float in normalized range [0 … 1], since some keyboards support various levels of pressure.

If you run the code now and press the WASD keys, you will hear a “pop” sound, indicating that you pressed an invalid key. Fortunately, there is an easy workaround for this.

GameViewController.swift:

override func viewDidLoad() {
    ...
    
    // HACK: Disable "pop" sound
    NSEvent.addLocalMonitorForEvents(matching: .keyDown) {_ in
       return nil
    }
}

Moving the camera around

Let's first remove the code that moves the camera in circle.

Renderer.swift:

func draw(in view: MTKView) {
    //camera.position.x = Float(5.0 * sin(Date().timeIntervalSince1970))
    //camera.position.z = Float(5.0 * cos(Date().timeIntervalSince1970))
    
    //camera.direction = -normalize(camera.position)

    ...
}

And adjust the initial camera position.

Camera.swift:

class Camera {
    var position = simd_float3(0.0, 1.0, -4.0)
    ...
}

We now need to keep track of connected keyboards.

Renderer.swift:

...
import GameController
...

class Renderer: NSObject, MTKViewDelegate {
    ...
    
    // Camera
    var camera = Camera()

    // Input devices
    var keyboards = Array<GCKeyboard>()
    
    ...
}

We create an array of keyboards, since we want to support more than one keyboard at a time.

Let's also add the connected keyboard to the list instead of just printing a message when a key gets pressed.

GameViewController.swift:

NotificationCenter.default.addObserver(forName: NSNotification.Name.GCKeyboardDidConnect, object: nil, queue: nil) {
    (note) in
    guard let _keyboard = note.object as? GCKeyboard else {
        return
    }
    
    self.renderer.keyboards.append(_keyboard)
}

And finally, we can use this to move the camera position.

Renderer.swift:

...

class Renderer: NSObject, MTKViewDelegate {
    ...
    
    // Time
    var lastTime: Double
    
    init?(metalKitView: MTKView) {
        ...
        
        // Time
        lastTime = Date().timeIntervalSince1970
        
        super.init()
    }

    func draw(in view: MTKView) {
        // Delta time
        let crntTime = Date().timeIntervalSince1970
        let dt = Float(crntTime - lastTime)
        lastTime = crntTime
        
        var leftJoystick = simd_float2(0.0, 0.0)
        for keyboard in keyboards {
            if keyboard.keyboardInput!.button(forKeyCode: .keyW)!.value > 0.5 {
                leftJoystick.y += 1.0
            }
            if keyboard.keyboardInput!.button(forKeyCode: .keyA)!.value > 0.5 {
                leftJoystick.x -= 1.0
            }
            if keyboard.keyboardInput!.button(forKeyCode: .keyS)!.value > 0.5 {
                leftJoystick.y -= 1.0
            }
            if keyboard.keyboardInput!.button(forKeyCode: .keyD)!.value > 0.5 {
                leftJoystick.x += 1.0
            }
        }
        
        // Move the camera
        if leftJoystick.x != 0.0 || leftJoystick.y != 0.0 {
            leftJoystick = normalize(leftJoystick)
        }
        camera.position += -cross(camera.direction, simd_float3(0.0, 1.0, 0.0)) * leftJoystick.x * dt
        camera.position +=        camera.direction                              * leftJoystick.y * dt
        
        ...
    }

    ...
}

Every frame, we get the Delta time (dt), which is the time that elapsed since the last frame. We get the joystick values and normalize it in case it’s not 0 and move the camera:

forward/back when the Y axis of joystick is non-zero
right/left when X axis of joystick is non-zero (the left direction of camera can be got by taking the cross product of forward direction with the up vector)

I am using a controller-like variable names for the input variables (leftJoystick), since it suites our use case pretty well.

Also, don't forget to remove the keyboard from the list when it disconnects.

GameViewController.swift:

// Keyboard
...

NotificationCenter.default.addObserver(forName: NSNotification.Name.GCKeyboardDidDisconnect, object: nil, queue: nil) {
    (note) in
    guard let _keyboard = note.object as? GCKeyboard else {
        return
    }
    
    self.renderer.keyboards.removeAll { (value) in
        return value == _keyboard
    }
}

Try running the code to see if you can move around.

Mouse inputs

We retrieved the pressed keys using GCKeyboard. On a similar note, we can track the mouse position using GCMouse. However, mouse works a bit differently: instead of checking for mouse position every frame, we will implement a callback for when the mouse moves.

First, let's create a rightJoystick variable that the view controller will be able to access.

Renderer.swift:

...

class Renderer: NSObject, MTKViewDelegate {
    ...
    
    // Input devices
    ...
    
    var rightJoystick = simd_float2(0.0, 0.0)
    
    ...

    func draw(in view: MTKView) {
        ...
        
        // Move the camera
        ...
        
        // Rotate the camera
        camera.rotate(rotationAngles: rightJoystick)
        rightJoystick = simd_float2(0.0, 0.0)
        
        ...
    }

    ...
}

The camera.rotate() function isn't implemented yet, we will look at it in just a moment. But first, we will register mouse position callbacks when a mouse connects.

GameViewController.swift:

override func viewDidLoad() {
    ...
    
    // Input
    
    // Keyboard
    ...
    
    // Mouse
    NotificationCenter.default.addObserver(forName: NSNotification.Name.GCMouseDidConnect, object: nil, queue: nil) {
        (note) in
        guard let _mouse = note.object as? GCMouse else {
            return
        }
        
        // Register callbacks
        _mouse.mouseInput?.mouseMovedHandler = {
            (mouseInput, deltaX, deltaY) in
            if mouseInput.leftButton.isPressed {
                let windowSize = self.view.window?.contentView?.frame.size
                let normX = deltaX / Float(windowSize!.width)
                let normY = deltaY / Float(windowSize!.height)
                
                self.renderer.rightJoystick.x += normX
                self.renderer.rightJoystick.y += normY
            }
        }
    }
    
    ...
}

When the left mouse button is pressed, we get how much has the mouse moved (deltaX, deltaY) and divide it by window size to get it to the range [-1 … 1].

Let's now implement the rotate function for camera and print the delta X and Y to see if it's working as expected.

Camera.swift:

...

class Camera {
    ...
    
    func rotate(rotationAngles: simd_float2) {
        print(rotationAngles)
    }
}

You should see the rotation being printed into the console every frame.

Rotating the camera

Without further ado, let's jump right into the code.

Camera.swift:

...

class Camera {
    ...
    
    var up = simd_float3(0.0, 1.0, 0.0)
    
    func getViewMatrix() -> simd_float4x4 {
        return createViewMatrix(eyePosition: position, targetPosition: position + direction, upVec: up)
    }
    
    func rotate(rotationAngles: simd_float2) {
        direction = rotateVectorAroundNormal(vec: direction, angle: rotationAngles.y, normal: normalize(cross(direction, up)))
        direction = rotateVectorAroundNormal(vec: direction, angle: rotationAngles.x, normal: up)
    }
}

We use a function rotateVectorAroundNormal (we have yet to implement it) to rotate the camera direction around the 2 axis separately:

up and down — the normal is the left direction (cross of forward and up)
left and right — the normal is the up direction

I also created a member variable up for convenience and flexibility.

The actual vector rotate function is, however, a bit more complicated.

Math.swift:

...

func rotateVectorAroundNormal(vec: simd_float3, angle: Float, normal: simd_float3) -> simd_float3 {
    let c = cos(angle)
    let s = sin(angle)

    let axis = normalize(normal)
    let tmp = (1.0 - c) * axis

    var rotationMat = simd_float3x3(1.0)
    rotationMat[0][0] = c + tmp[0] * axis[0]
    rotationMat[0][1] = tmp[0] * axis[1] + s * axis[2]
    rotationMat[0][2] = tmp[0] * axis[2] - s * axis[1]

    rotationMat[1][0] = tmp[1] * axis[0] - s * axis[2]
    rotationMat[1][1] = c + tmp[1] * axis[1]
    rotationMat[1][2] = tmp[1] * axis[2] + s * axis[0]

    rotationMat[2][0] = tmp[2] * axis[0] + s * axis[1]
    rotationMat[2][1] = tmp[2] * axis[1] - s * axis[0]
    rotationMat[2][2] = c + tmp[2] * axis[2]

    return rotationMat * vec
}

Once again, you don't need to fully understand the code. But if you are interested in the math behind it, you can read this wikipedia page.

Now we can finally look around! The speed of the movement and rotation can be a bit slow, so feel free to multiply it by some constant number.

Back face culling

If you go inside the cube with your camera, you will notice that the cube faces are visible from inside.

While it does not cause any visual problems, the GPU has to do more work (it has to usually render twice as many faces). Fortunately, there is a very easy fix to it. We can tell Metal to discard every back face. And how does Metal know which face is front and which one is back? Well, we will provide information about the order in which our vertices are ordered in every triangle (can be either clockwise or counter-clockwise). I think its best explained with an image:

We can tell the render encoder in what order are our vertices and which faces to cull.

Renderer.swift:

func draw(in view: MTKView) {
    ...
    
    // Bind depth stencil state
    ...
    
    // Discard back faces
    renderEncoder.setFrontFacing(.clockwise)
    renderEncoder.setCullMode(.back)
}

Note that it has the same effect as using .counterClockwise and .front.

If you run the code, you will notice that most of our cube has disappeared:

The reason for that is that the top and right face are in counter-clockwise order. All we have to do is reorder the indices.

Renderer.swift:

...

let indices: [ushort] = [
    // Front
    0, 3, 2,
    2, 1, 0,
    
    // Back
    4, 5, 6,
    6, 7 ,4,
    
    // Left
    11, 8, 9,
    9, 10, 11,
    
    // Right
    12, 14, 13,
    12, 15, 14,
    
    // Bottom
    16, 17, 18,
    18, 19, 16,
    
    // Top
    20, 22, 21,
    20, 23, 22
]

...

And the cube is there once again:

Light maps

Light maps are a way of adding more detail to our objects. We are actually already using one: a diffuse map. It's what we have been referring to as “diffuseTexture” and “colorTexture” in the code. We are now going to add another light map, a specular map. It looks like this:

You can see that the image is just black and white values in the range between 0 and 1. We will multiply our computed specular term with this value in the shader.

But let's first load the texture.

Renderer.swift:

...

class Renderer: NSObject, MTKViewDelegate {
    ...
    
    // Textures
    var diffuseTexture: MTLTexture?
    var specularTexture: MTLTexture?
    
    ...
    
    init?(metalKitView: MTKView) {
        ...
        
        // Loading textures
        let loader = MTKTextureLoader(device: self.device)
        
        do {
            let url = Bundle.main.url(forResource: "planks", withExtension: "png")
            self.diffuseTexture = try loader.newTexture(URL: url!, options: nil)
        } catch {
            print("Failed to load image 'planks.png'")
        }
        
        do {
            let url = Bundle.main.url(forResource: "planks_specular", withExtension: "png")
            self.specularTexture = try loader.newTexture(URL: url!, options: nil)
        } catch {
            print("Failed to load image 'planks_specular.png'")
        }
        
        ...
    }

    func draw(in view: MTKView) {
        ...
        
        // Bind textures
        renderEncoder.setFragmentTexture(self.diffuseTexture, index: 0)
        renderEncoder.setFragmentTexture(self.specularTexture, index: 1)
        
        ...
    }

    ...
}

I also renamed the colorTexture to diffuseTexture, since it is a better name in this case.

And we can now accept the texture as an argument to the shader and multiply the specular term by it.

Shaders.metal:

...

inline float3 phongLighting(float3 worldPosition, float3 diffuseColor, float specularColor, float3 normal, float3 viewPosition) {
    ...
    
    //Specular
    ...
    float3 specular = lightSpecular * spec * specularColor;
    
    ...
}

fragment float4 fragmentFunction(VertexOut in [[stage_in]], constant float3& viewPosition [[buffer(0)]], texture2d<float> diffuseTexture [[texture(0)]], texture2d<float> specularTexture [[texture(1)]]) {
    ...
    float4 specularColor = specularTexture.sample(colorSampler, in.texCoord);
    
    return float4(phongLighting(in.worldPosition, diffuseColor.rgb, specularColor.r, normalize(in.normal), viewPosition), 1.0);
}

The difference might not be that obvious, but you can look at this side-by-side comparison:

left: without specular map, right: with specular map

As you can see, the difference is quite visible. Also note that MetalKit automatically loads the texture with only the R channel instead of the full RGBA. We can view the texture format using Metal Frame Capture:

Specular texture loaded with r16 unorm format

Conclusion

We finally added support for mouse and keyboard using GameController framework so that we can observe various graphics effects that we will implement in the future. We also added specular light map, greatly enhancing the details. In the next tutorial, we will load a real 3D model from file instead of just the cube. Anyway, I hope you found this tutorial useful. See you next time!

Source code: https://github.com/SamoZ256/MetalTutorial

Apple's Metal API tutorial (part 6 — Camera movement)

Camera class

User inputs

Keyboard inputs

Moving the camera around

Mouse inputs

Rotating the camera

Back face culling

Light maps

Conclusion

Written by Samuel Žúbor