An introduction to ARKit 2 — Image Tracking

Mohammed Ibrahim
5 min readJun 20, 2018


Last WWDC, iOS 12 was announced along with ARKit 2. The original ARKit was released last year with iOS 11 — it allowed developers to add AR to their apps using a native iOS framework and users to enjoy it. With ARKit 2, new features are added to the framework which takes the experience to a whole ‘nother level. Things like image tracking, world maps, object scanning, and vision integration are among the most anticipated features for their newest release.

Since this is just part one, I will start with image tracking and will continue to go through each of the rest of the topics as we progress. This is the list of the collection, which will update as we go on:

  1. Image Tracking
  2. World Mapping
  3. Object Scanning
  4. Vision Integration

The Basics

Personally, image tracking is the best feature that ARKit 2 has. It allows developers to upload images to their apps, such as a photo from a picture frame, which can then identify those pictures in real time when using the app and do something on it — such as placing a 3D model on top of it.

In the example at WWDC, the presenter used a picture frame of his cat. He uploaded the image to the app as an ARReferenceImage. Then, you define it in the app as such:

// Define a variable to hold all your reference imageslet referenceImages = ARReferenceImage.referenceImages(inGroupNamed: "AR Resources", bundle: Bundle.main)

This collects all the reference images you put in the AR resources folder in your assets folder.

Next, we have to configure our AR session. We do so by setting instantiating ARImageTrackingConfiguration. Along with that, you must set its trackingImages property as well as its maximumNumberOfTrackedImages property. That second property lets the app know how much images the app is ‘allowed’ to track at the same time — this is generally decided based on what you intended use for your app is. Here is an example:

// Create a session configurationlet configuration = ARImageTrackingConfiguration()configuration.trackingImages = referenceImagesconfiguration.maximumNumberOfTrackedImages = 1
// Run the view's

Congratulations! Your app now will automatically track images that you tell it to. From there, we use the renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? function to decide what to do when the app actually recognizes one of the reference images. This function is called at every frame that the scene is run at.

We will go over an example of how we can leverage that function and image tracking.

Creating a moving picture frame using ARKit 2 and image tracking

In this example, we will utilize image tracking to turn a photo on a picture frame into a live video — its a very basic example that was even used at WWDC.

To begin, we got to create an AVPlayer which simply allows your app to play a video. First, download a video and then add it to your assets folder. Then, add this code above your viewDidLoad() :

// Load video and create video playerlet videoPlayer : AVPlayer = {    // Load cat video from bundle    guard let url = Bundle.main.url(forResource: "video",   withExtension: "mp4") else {        print("Could not find video file.")        return AVPlayer()    }    return AVPlayer(url: url)}()

Make sure to change the video name under forResource to the name of your video and the withExtension property to whatever your video extension is.

After that, set up your reference images and configuration as discussed before. In this case, the reference image should be a still from the video itself. If it was a video of a rocket taking off, for example, the image could be a frame of it still docked.

After you’ve set up all that, you have to set up the renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? function so that the app can react to recognizing an image — this allows you to handle that and do what you wish when that happens.

I’ll walk you through our function and its contents:

Since the function must return a SCNNode , we have to declare an empty node to be returned in the end:

let node = SCNNode()

We then check to see if the anchor thats returned at that frame is, indeed, an image anchor. We use an if let statement to do that.

In the statement, which will only run if it is confirmed to be an image anchor, we create a SCNPlane with the dimensions of the anchor’s physical size, which basically means the size of the image in real life. That way, the node will take the full size (height and width) of the original image that it identified, covering it completely.

let plane = SCNPlane(width: imageAnchor.referenceImage.physicalSize.width, height: imageAnchor.referenceImage.physicalSize.height)

Then, we set the plane’s material contents to the videoPlayer which we setup earlier. At this point, we play the video player itself.

plane.firstMaterial?.diffuse.contents =

We then create another SCNNode , called planeNode, with the geometry of the plane that we just created.

let planeNode = SCNNode(geometry: plane)

Lastly, we set the eulerAngles.x property of the planeNode and then add it as a child node to the original empty SCNNode that we declared before the if let statement.

planeNode.eulerAngles.x = -.pi / 2node.addChildNode(planeNode)

The final thing to do is return the node, as the function expects.

return node

Here is the render function all in one go:

func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {     let node = SCNNode()     if let imageAnchor = anchor as? ARImageAnchor {
let plane = SCNPlane(width: imageAnchor.referenceImage.physicalSize.width, height: imageAnchor.referenceImage.physicalSize.height) plane.firstMaterial?.diffuse.contents = videoPlayer
let planeNode = SCNNode(geometry: plane)
planeNode.eulerAngles.x = -.pi / 2
node.addChildNode(planeNode) } return node}

That’s it! The possibilities are endless! Here is a cool example that someone put on Twitter:

Have fun with it!



Mohammed Ibrahim

WWDC 18 Scholar | CoherentHub | iOS Developer | UI Designer