Apple's Metal API tutorial (part 6 — Camera movement)

Samuel Žúbor
10 min readFeb 27, 2024

Hello everyone! In the last tutorial, we added some lighting to our cube to make it look more realistically. Today, we are going to look at how we can move around the world using mouse and keyboard and add some detail to the cube using light maps.

Camera class

Let's first encapsulate our camera into a separate class to make things easier. I will create a Camera.swift file that will contain the code for camera.

Camera.swift:

import Foundation
import simd

class Camera {
var position = simd_float3(4.5, 5.0, 0.0)
var direction = simd_float3(0.0, 0.0, 1.0)

func getViewMatrix() -> simd_float4x4 {
return createViewMatrix(eyePosition: position, targetPosition: position + direction, upVec: simd_float3(0.0, 1.0, 0.0))
}
}

The file is very simple, its just a small abstraction layer. Note that we use camera direction instead of target position for convenience.

We will also have to change the renderer.

Renderer.swift:

...

class Renderer: NSObject, MTKViewDelegate {
...

//Camera
var camera = Camera()

...

func draw(in view: MTKView) {
camera.position.x = Float(5.0 * sin(Date().timeIntervalSince1970))
camera.position.z = Float(5.0 * cos(Date().timeIntervalSince1970))

camera.direction = --normalize(camera.position)

...

//Create the view matrix
var viewMatrix = camera.getViewMatrix()
renderEncoder.setVertexBytes(&viewMatrix, length: MemoryLayout.stride(ofValue: viewMatrix), index: 1)

//Upload view position
renderEncoder.setFragmentBytes(&camera.position, length: MemoryLayout.stride(ofValue: camera.position), index: 0)

...
}

...
}

We set the direction to the opposite of camera position so that the camera “looks” towards (0, 0, 0).

If you run the code, there should be no changes.

User inputs

Getting user inputs in swift is fairly easy thanks to the GameController framework. It provides ways for getting controller, keyboard and mouse inputs, as well something called “Virtual controller”, which we will look at when we will be implementing support for iOS.

Keyboard inputs

Let's start right away by printing a message when the user pressed WASD keys.

GameViewController.swift:

...
import GameController

// Our macOS specific view controller
class GameViewController: NSViewController {
...

override func viewDidLoad() {
...

// Input
NotificationCenter.default.addObserver(forName: NSNotification.Name.GCKeyboardDidConnect, object: nil, queue: nil) {
(note) in
guard let _keyboard = note.object as? GCKeyboard else {
return
}

// Register callbacks
_keyboard.keyboardInput?.keyChangedHandler = {
(keyboardInput, controllerButton, key, isPressed) in
if keyboardInput.button(forKeyCode: .keyW)!.value > 0.5 {
print("Key W pressed")
}
if keyboardInput.button(forKeyCode: .keyA)!.value > 0.5 {
print("Key A pressed")
}
if keyboardInput.button(forKeyCode: .keyS)!.value > 0.5 {
print("Key S pressed")
}
if keyboardInput.button(forKeyCode: .keyD)!.value > 0.5 {
print("Key D pressed")
}
}
}
}
}

We first add callback to get notified when a keyboard connects (its called when a keyboard is already connected as well). We then add callback to get notified when a key gets pressed. Note that value of the key is a Float in normalized range [0 … 1], since some keyboards support various levels of pressure.

If you run the code now and press the WASD keys, you will hear a “pop” sound, indicating that you pressed an invalid key. Fortunately, there is an easy workaround for this.

GameViewController.swift:

override func viewDidLoad() {
...

// HACK: Disable "pop" sound
NSEvent.addLocalMonitorForEvents(matching: .keyDown) {_ in
return nil
}
}

Moving the camera around

Let's first remove the code that moves the camera in circle.

Renderer.swift:

func draw(in view: MTKView) {
//camera.position.x = Float(5.0 * sin(Date().timeIntervalSince1970))
//camera.position.z = Float(5.0 * cos(Date().timeIntervalSince1970))

//camera.direction = -normalize(camera.position)

...
}

And adjust the initial camera position.

Camera.swift:

class Camera {
var position = simd_float3(0.0, 1.0, -4.0)
...
}

We now need to keep track of connected keyboards.

Renderer.swift:

...
import GameController
...

class Renderer: NSObject, MTKViewDelegate {
...

// Camera
var camera = Camera()

// Input devices
var keyboards = Array<GCKeyboard>()

...
}

We create an array of keyboards, since we want to support more than one keyboard at a time.

Let's also add the connected keyboard to the list instead of just printing a message when a key gets pressed.

GameViewController.swift:

NotificationCenter.default.addObserver(forName: NSNotification.Name.GCKeyboardDidConnect, object: nil, queue: nil) {
(note) in
guard let _keyboard = note.object as? GCKeyboard else {
return
}

self.renderer.keyboards.append(_keyboard)
}

And finally, we can use this to move the camera position.

Renderer.swift:

...

class Renderer: NSObject, MTKViewDelegate {
...

// Time
var lastTime: Double

init?(metalKitView: MTKView) {
...

// Time
lastTime = Date().timeIntervalSince1970

super.init()
}

func draw(in view: MTKView) {
// Delta time
let crntTime = Date().timeIntervalSince1970
let dt = Float(crntTime - lastTime)
lastTime = crntTime

var leftJoystick = simd_float2(0.0, 0.0)
for keyboard in keyboards {
if keyboard.keyboardInput!.button(forKeyCode: .keyW)!.value > 0.5 {
leftJoystick.y += 1.0
}
if keyboard.keyboardInput!.button(forKeyCode: .keyA)!.value > 0.5 {
leftJoystick.x -= 1.0
}
if keyboard.keyboardInput!.button(forKeyCode: .keyS)!.value > 0.5 {
leftJoystick.y -= 1.0
}
if keyboard.keyboardInput!.button(forKeyCode: .keyD)!.value > 0.5 {
leftJoystick.x += 1.0
}
}

// Move the camera
if leftJoystick.x != 0.0 || leftJoystick.y != 0.0 {
leftJoystick = normalize(leftJoystick)
}
camera.position += -cross(camera.direction, simd_float3(0.0, 1.0, 0.0)) * leftJoystick.x * dt
camera.position += camera.direction * leftJoystick.y * dt

...
}

...
}

Every frame, we get the Delta time (dt), which is the time that elapsed since the last frame. We get the joystick values and normalize it in case it’s not 0 and move the camera:

  • forward/back when the Y axis of joystick is non-zero
  • right/left when X axis of joystick is non-zero (the left direction of camera can be got by taking the cross product of forward direction with the up vector)

I am using a controller-like variable names for the input variables (leftJoystick), since it suites our use case pretty well.

Also, don't forget to remove the keyboard from the list when it disconnects.

GameViewController.swift:

// Keyboard
...

NotificationCenter.default.addObserver(forName: NSNotification.Name.GCKeyboardDidDisconnect, object: nil, queue: nil) {
(note) in
guard let _keyboard = note.object as? GCKeyboard else {
return
}

self.renderer.keyboards.removeAll { (value) in
return value == _keyboard
}
}

Try running the code to see if you can move around.

Mouse inputs

We retrieved the pressed keys using GCKeyboard. On a similar note, we can track the mouse position using GCMouse. However, mouse works a bit differently: instead of checking for mouse position every frame, we will implement a callback for when the mouse moves.

First, let's create a rightJoystick variable that the view controller will be able to access.

Renderer.swift:

...

class Renderer: NSObject, MTKViewDelegate {
...

// Input devices
...

var rightJoystick = simd_float2(0.0, 0.0)

...

func draw(in view: MTKView) {
...

// Move the camera
...

// Rotate the camera
camera.rotate(rotationAngles: rightJoystick)
rightJoystick = simd_float2(0.0, 0.0)

...
}

...
}

The camera.rotate() function isn't implemented yet, we will look at it in just a moment. But first, we will register mouse position callbacks when a mouse connects.

GameViewController.swift:

override func viewDidLoad() {
...

// Input

// Keyboard
...

// Mouse
NotificationCenter.default.addObserver(forName: NSNotification.Name.GCMouseDidConnect, object: nil, queue: nil) {
(note) in
guard let _mouse = note.object as? GCMouse else {
return
}

// Register callbacks
_mouse.mouseInput?.mouseMovedHandler = {
(mouseInput, deltaX, deltaY) in
if mouseInput.leftButton.isPressed {
let windowSize = self.view.window?.contentView?.frame.size
let normX = deltaX / Float(windowSize!.width)
let normY = deltaY / Float(windowSize!.height)

self.renderer.rightJoystick.x += normX
self.renderer.rightJoystick.y += normY
}
}
}

...
}

When the left mouse button is pressed, we get how much has the mouse moved (deltaX, deltaY) and divide it by window size to get it to the range [-1 … 1].

Let's now implement the rotate function for camera and print the delta X and Y to see if it's working as expected.

Camera.swift:

...

class Camera {
...

func rotate(rotationAngles: simd_float2) {
print(rotationAngles)
}
}

You should see the rotation being printed into the console every frame.

Rotating the camera

Without further ado, let's jump right into the code.

Camera.swift:

...

class Camera {
...

var up = simd_float3(0.0, 1.0, 0.0)

func getViewMatrix() -> simd_float4x4 {
return createViewMatrix(eyePosition: position, targetPosition: position + direction, upVec: up)
}

func rotate(rotationAngles: simd_float2) {
direction = rotateVectorAroundNormal(vec: direction, angle: rotationAngles.y, normal: normalize(cross(direction, up)))
direction = rotateVectorAroundNormal(vec: direction, angle: rotationAngles.x, normal: up)
}
}

We use a function rotateVectorAroundNormal (we have yet to implement it) to rotate the camera direction around the 2 axis separately:

  • up and down — the normal is the left direction (cross of forward and up)
  • left and right — the normal is the up direction

I also created a member variable up for convenience and flexibility.

The actual vector rotate function is, however, a bit more complicated.

Math.swift:

...

func rotateVectorAroundNormal(vec: simd_float3, angle: Float, normal: simd_float3) -> simd_float3 {
let c = cos(angle)
let s = sin(angle)

let axis = normalize(normal)
let tmp = (1.0 - c) * axis

var rotationMat = simd_float3x3(1.0)
rotationMat[0][0] = c + tmp[0] * axis[0]
rotationMat[0][1] = tmp[0] * axis[1] + s * axis[2]
rotationMat[0][2] = tmp[0] * axis[2] - s * axis[1]

rotationMat[1][0] = tmp[1] * axis[0] - s * axis[2]
rotationMat[1][1] = c + tmp[1] * axis[1]
rotationMat[1][2] = tmp[1] * axis[2] + s * axis[0]

rotationMat[2][0] = tmp[2] * axis[0] + s * axis[1]
rotationMat[2][1] = tmp[2] * axis[1] - s * axis[0]
rotationMat[2][2] = c + tmp[2] * axis[2]

return rotationMat * vec
}

Once again, you don't need to fully understand the code. But if you are interested in the math behind it, you can read this wikipedia page.

Now we can finally look around! The speed of the movement and rotation can be a bit slow, so feel free to multiply it by some constant number.

Back face culling

If you go inside the cube with your camera, you will notice that the cube faces are visible from inside.

While it does not cause any visual problems, the GPU has to do more work (it has to usually render twice as many faces). Fortunately, there is a very easy fix to it. We can tell Metal to discard every back face. And how does Metal know which face is front and which one is back? Well, we will provide information about the order in which our vertices are ordered in every triangle (can be either clockwise or counter-clockwise). I think its best explained with an image:

credit: LearnOpenGL — Face culling

We can tell the render encoder in what order are our vertices and which faces to cull.

Renderer.swift:

func draw(in view: MTKView) {
...

// Bind depth stencil state
...

// Discard back faces
renderEncoder.setFrontFacing(.clockwise)
renderEncoder.setCullMode(.back)
}

Note that it has the same effect as using .counterClockwise and .front.

If you run the code, you will notice that most of our cube has disappeared:

The reason for that is that the top and right face are in counter-clockwise order. All we have to do is reorder the indices.

Renderer.swift:

...

let indices: [ushort] = [
// Front
0, 3, 2,
2, 1, 0,

// Back
4, 5, 6,
6, 7 ,4,

// Left
11, 8, 9,
9, 10, 11,

// Right
12, 14, 13,
12, 15, 14,

// Bottom
16, 17, 18,
18, 19, 16,

// Top
20, 22, 21,
20, 23, 22
]

...

And the cube is there once again:

Light maps

Light maps are a way of adding more detail to our objects. We are actually already using one: a diffuse map. It's what we have been referring to as “diffuseTexture” and “colorTexture” in the code. We are now going to add another light map, a specular map. It looks like this:

planks_specular.png

You can see that the image is just black and white values in the range between 0 and 1. We will multiply our computed specular term with this value in the shader.

But let's first load the texture.

Renderer.swift:

...

class Renderer: NSObject, MTKViewDelegate {
...

// Textures
var diffuseTexture: MTLTexture?
var specularTexture: MTLTexture?

...

init?(metalKitView: MTKView) {
...

// Loading textures
let loader = MTKTextureLoader(device: self.device)

do {
let url = Bundle.main.url(forResource: "planks", withExtension: "png")
self.diffuseTexture = try loader.newTexture(URL: url!, options: nil)
} catch {
print("Failed to load image 'planks.png'")
}

do {
let url = Bundle.main.url(forResource: "planks_specular", withExtension: "png")
self.specularTexture = try loader.newTexture(URL: url!, options: nil)
} catch {
print("Failed to load image 'planks_specular.png'")
}

...
}

func draw(in view: MTKView) {
...

// Bind textures
renderEncoder.setFragmentTexture(self.diffuseTexture, index: 0)
renderEncoder.setFragmentTexture(self.specularTexture, index: 1)

...
}

...
}

I also renamed the colorTexture to diffuseTexture, since it is a better name in this case.

And we can now accept the texture as an argument to the shader and multiply the specular term by it.

Shaders.metal:

...

inline float3 phongLighting(float3 worldPosition, float3 diffuseColor, float specularColor, float3 normal, float3 viewPosition) {
...

//Specular
...
float3 specular = lightSpecular * spec * specularColor;

...
}

fragment float4 fragmentFunction(VertexOut in [[stage_in]], constant float3& viewPosition [[buffer(0)]], texture2d<float> diffuseTexture [[texture(0)]], texture2d<float> specularTexture [[texture(1)]]) {
...
float4 specularColor = specularTexture.sample(colorSampler, in.texCoord);

return float4(phongLighting(in.worldPosition, diffuseColor.rgb, specularColor.r, normalize(in.normal), viewPosition), 1.0);
}

The difference might not be that obvious, but you can look at this side-by-side comparison:

left: without specular map, right: with specular map

As you can see, the difference is quite visible. Also note that MetalKit automatically loads the texture with only the R channel instead of the full RGBA. We can view the texture format using Metal Frame Capture:

Specular texture loaded with r16 unorm format

Conclusion

We finally added support for mouse and keyboard using GameController framework so that we can observe various graphics effects that we will implement in the future. We also added specular light map, greatly enhancing the details. In the next tutorial, we will load a real 3D model from file instead of just the cube. Anyway, I hope you found this tutorial useful. See you next time!

Source code: https://github.com/SamoZ256/MetalTutorial

--

--