Getting Started with ARKit: Waypoints

Tuesday, July 4, 2017

Summary: This post covers my 2-day exploration of ARKit to build a virtual compass/waypoint system on iOS 11 Beta. The first section provides some backstory about myself and the motivations around this project. The second section is more technical and explains how ARKit works, how I brushed up on linear algebra, and how to get to a working demo!

I decided to spend my July 4 long weekend this year messing around with something I’ve been pretty excited about: Augmented Reality.

Just a month ago at WWDC 2017, Apple announced ARKit for iOS 11, a framework for building augmented reality apps. This is provided through the new beta OS version and Xcode for developers to play around with. The demos were fascinating and I immediately found myself thinking about all the interesting applications that were possible with this technology.

Most compelling to me was the idea of a waypoint navigation system. Hot off the heels of a SE Asia backpacking trip, I was reminded of the countless hours I spent desperately trying to navigate the streets of foreign cities. Even though I had map apps available, transitioning back-and-forth between the 2D map world and the real world proved challenging.

This also reminded me of my college UX Design professor, who always said that one should look towards video game interfaces when it comes to designing intuitive user experiences. And its true — nobody ever really gets lost in video games because the entire experience is designed to keep the player progressing towards the next destination. A lot of what enables this is rich information shown through the HUD.

Example of a waypoint in Fallout 4

With augmented reality, this type of rich information is no longer limited to just video games. We can actually provide this type of interface directly on a user’s device, as they are walking around the real world! People may struggle with reading a map, but I bet they can easily follow a virtual icon in AR.

With this vision in mind, I decided my goal for the first version would be to render waypoints as an icon or label in the direction of the desired destination.

From this point on, the rest of this blog will explore some of the technical details of my little project, and how I got the initial version running.

ARKit Fundamentals

ARKit works by using a technique called visual-inertial odometry (VIO), a fancy term that means it uses iOS camera signals and physical CoreMotion signals to track the world. In doing so, it maintains a virtual 3d coordinate system of the world as the user looks around via the camera. By rendering virtual objects on this coordinate system, and overlaying this system over the live images from the camera, the effect of “augmented reality” is produced.

Starting an Augmented Reality app is super easy

ARSCNView vs ARSKView

Through the Apple ARKit Documentation, I learned about the two types of out-of-the-box rendering approaches — ARSCNView and ARSKView. In short, ARSCNView is best used for rendering 3d objects in a 3d space, whereas ARSKView uses SpriteKit to render 2d sprites in a 3d space. Because they are 2D, the sprites are automatically rotated to face the user as they walk around the space (as opposed to a 3d object which allows you to view it from multiple angles). For a waypoint, the 2D implementation of ARSKView is what we need.

Right off the bat, this new Augmented Reality project will create a ViewController that includes an ARSKView called sceneView. sceneView maintains a running session, which is the object that is responsible for the tracking and state management — we actually shouldn’t have to mess with it too much. Instead, the session is initiated simply by calling run and we just need to pass in a configured ARSessionConfiguration to describe how it is to be run.

SessionConfiguration

ARWorldTrackingSessionConfiguration is a type of ARSessionConfiguration that tells the session to track the world using the VIO technique described above, up to 6 degrees of freedom. This allows the tracking system to not only maintain the initial 3 degrees-of-freedom (x, y, z), but also translations as well (ie. when the user walks around).

An extra setting we need to configure is for

configuration.worldAlignment = .gravityAndHeading

This tells the session to define the coordinate system based on gravity and compass points, otherwise the {x, y, z}-axis will be defined arbitrarily based on the initial camera angle.

This should be the resulting configuration code:

override func viewWillAppear(_ animated: Bool) {      
super.viewWillAppear(animated)
  // Create a session configuration
let configuration = ARWorldTrackingSessionConfiguration()
configuration.worldAlignment = .gravityAndHeading
  // Run the view's session
sceneView.session.run(configuration)
}

Adding Sprites

Scene.swift shows how sprites are added to the Scene.

override func touchesBegan(_ touches: Set<UITouch>, with event: UIEvent?) {
guard let sceneView = self.view as? ARSKView else {
return
}
  // Create anchor using the camera’s current position
  if let currentFrame = sceneView.session.currentFrame {
// Create a transform with a translation of 0.2 meters in front
// of the camera
var translation = matrix_identity_float4x4
translation.columns.3.z = -0.2
    let transform = simd_mul(
currentFrame.camera.transform,
translation
)
    // Add a new anchor to the session
let anchor = ARAnchor(transform: transform)
sceneView.session.add(anchor: anchor)
}
}

When the scene is tapped, this function captures the camera’s current position (as a transform) and initializes an ARAnchor at the new transform, 0.2 meters in front of the camera. (Note: We’ll discuss how transforms work in the next section).

ARAnchors act as reference points in 3D Space. Just like with CollectionViews, you can control how each is rendered by implementing methods of ARSKViewDelegate.

nodeFor returns an SKNode which is a base class used to render Sprites. By default, the project template renders labels.

func view(_ view: ARSKView, nodeFor anchor: ARAnchor) -> SKNode? {
// Create and configure a node for the anchor added to the view's
// session.
let labelNode = SKLabelNode(text: "👾")
  labelNode.horizontalAlignmentMode = .center
labelNode.verticalAlignmentMode = .center
return labelNode;
}

Just to mess around, I changed the label to show the 🌧 emoji. I also experimented with modifying the distance of new ARAnchors to be added 0.5 meters from the camera.

So far, I’ve only edited a few minor lines of code from the boilerplate project, but this was already quite entertaining.

At this point, I decided to jump straight into getting the navigation working.

Linear Algebra

The above code regarding transforms clued me into the fact that in order to build a navigational system, I would need to brush up on how representing 3D space as matrices works, as well as how to perform operations on matrices. I was also confused about why objects in 3D space were represented in 4x4 matrices.

I found this great OpenGL tutorial on 3D Matrices.

Translation

Translation means to move a vertex in a direction.

I learned that 4x4 matrices are used for 3D vectors because the 4th dimension allows for translation operations.

Multiplying a 4x4 matrix by vertex results in transformed vertex

This is a translation matrix.

Multiplying this with a vertex {x, y, z, w} results in a translated vertex:

[1 0 0 X]   [x]   [ x + X*w ]
[0 1 0 Y] x [y] = [ y + Y*w ]
[0 0 1 Z] [z] [ z + Z*w ]
[0 0 0 1] [w] [ W ]

Therefore, translating an object by distance D on the {x, y, z} coordinate is simply a matter of setting the {X, Y, Z} values on the translation matrix T to equal D.

In the Scene code above, we can see that because we want the ARAnchor transforms to being initialized 0.2 meters in front of the camera transform, we defined a translation matrix with the Zvalue (z value at column 3) set to -0.2. This translation matrix T gets multiplied with the camera transform matrix C resulting in the final transform to provide to ARAnchor.

Translated = T x C
Important: Order matters when it comes to Matrix Multiplication: 
T x C != C x T
Note that the simd_mul function takes the arguments in reverse order — with the original matrix C as the first argument, followed by the transform matrix T. More on this below.

For the waypoint system, my initial approach would be to translate the marker node to be a few meters away from the camera (say 5 meters).

Rotation

In order to rotate about a certain axis, there are 3 different rotation transform matrices:

For the scope of this project, we will not take into account a destination’s altitude, and instead simply point towards the horizontal bearing. This implies a rotation about the y-axis. Multiplying the matrix in the middle (Ry) with the camera transform gives a result transform that is ϴ degrees clockwise from the camera transform.

Rotated = Ry x C

For navigation however, rather than rotating based on the camera transform, we will need to use the world origin and rotate ϴ degrees from North.

Combining transforms

Combining transforms together was a little tricky. Because order matters with Matrix Multiplication, its important to do things in the right order

For example, if you want to scale S, rotate R, and translate T an object, its important to perform the operations in this order. Otherwise, if you translated first, and then scaled, the scale operation would multiply each vertex, including the already translated ones, taking them incorrectly further from the origin.

When matrix multiplication is performed, they are actually performed from right to left. Therefore, to abide by the order described above, this is how we would combine the transforms:

Final Transform = T x R x S x C

simd_mul describes these operations in the right to left order, which is why the original camera transform C was the first argument, followed by the translation.

In our case, we will call simd_mul using the right to left order:

simd_mul(C, simd_mul(R, T)) = Final Transform

Putting it together

Just to orient myself with the world coordinate system, I added some extra SKNodes along each axis to help me visualize it:

Now, we can see basic compass functionality.

{ North: z-, East: x+, South: z+, West: x- }

Bearing

In order to determine ϴ, I first had to use CLLocationManager to acquire my current location, and then calculate the bearing to a destination location. (StackOverflow gave me some quick copy-pasta on calculating bearing between two points):

/**
Precise bearing between two points.
*/
static func bearingBetween(startLocation: CLLocation, endLocation: CLLocation) -> Float {
  var azimuth: Float = 0
  let lat1 = GLKMathDegreesToRadians(
Float(startLocation.coordinate.latitude)
)
let lon1 = GLKMathDegreesToRadians(
Float(startLocation.coordinate.longitude)
)
  let lat2 = GLKMathDegreesToRadians(
Float(endLocation.coordinate.latitude)
)
let lon2 = GLKMathDegreesToRadians(
Float(endLocation.coordinate.longitude)
)
  let dLon = lon2 - lon1
  let y = sin(dLon) * cos(lat2)
let x = cos(lat1) * sin(lat2) - sin(lat1) * cos(lat2) * cos(dLon)
  let radiansBearing = atan2(y, x)
  azimuth = GLKMathRadiansToDegrees(Float(radiansBearing))
  if(azimuth < 0) { azimuth += 360 }
  return azimuth
}

GLKit also has a nice helper method to construct the rotation matrix about Y:

GLKMatrix4RotateY(GLKMatrix4Identity, theta)

With this, I was able to add my waypoint marker pointing towards my destination, at a distance of 5m away from the camera!

func getTransformGiven(currentLocation: CLLocation) -> matrix_float4x4 {
  let bearing = bearingBetween(
startLocation: currentLocation,
endLocation: location
)
let distance = 5
let originTransform = matrix_identity_float4x4
  // Create a transform with a translation of 5meter away
  let translationMatrix = MatrixHelper.translate(
x: 0,
y: 0,
z: distance * -1
)
  // Rotation matrix theta degrees
let rotationMatrix = MatrixHelper.rotateAboutY(
degrees: bearing * -1
)
  var transformMatrix = simd_mul(rotationMatrix, translationMatrix)  
return simd_mul(originTransform, transformMatrix)
}

Success!

After messing around with SpriteKit a bit (which I won’t go into detail here since I’m not very good), I ended up with this!

Next Steps

This was a cool accomplishment, but we’re not quite at a working waypoint system yet. I’ll have to spend some time improving on a few things to make it truly functional as envisioned.

For one, its not really a waypoint if the marker is just 5 meters away. If the user starts moving in any direction, the waypoint will just refer to the anchor 5m away rather than the actual destination. However, if I set the marker at the actual location (maybe 500m away), ARKit seems to auto-scale objects to be smaller based on the distance, so the Sprite will not be visible. I’ll need to either find a way to turn off this auto-scaling, or scale it back up myself proportional to the distance away.

Secondly, once the user starts moving, I’d love to show live information about how far the waypoint is away from the current location. However, this may be more granular than what CLLocationManager provides, so there will likely be more math required to take the motion captured by ARKit and calculate a more fine grained position!

Feedback/Suggestions

If you made it this far, you’re a pretty dedicated reader! Was the technical guide helpful? Was the backstory interesting?

This has been a fun technical exploration, but also my first time blogging about them — I’d love to hear what you thought! Happy ARing :)