Building an iPhone AR Museum App in iOS 11 with Apple’s ARKit Image Recognition

Ever since Apple announced ARKit last June at its annual developer conference WWDC people have been very excited to build highly interactive mobile apps with the new augmented reality kit.

With the new iOS 11.3 release Apple made ARKit (Version 1.5) even more powerful than before by adding (basic) image recognition, irregular shapes support and autofocus of the viewport, and more.

This post describes how to develop a basic AR–enabled image recognition iPhone app, that could for instance be used by museums to augment their collections with additional information. Other use cases could be: restaurant menu augmentation, interactive yearbooks, companion apps for cities / graffitis, interactive movie poster, street signs, etc. — at code & co., we’ll surely continue to explore other fun use cases.

This tutorial should give everyone that wants to play with AR a head start.

The test code of the final project can be found and downloaded on GitHub.

App Functionality

The scope of the first version of the app is quite simple and straightforward:

  1. User opens the app and sees the back camera full-screen in the iPhone’s viewport.
  2. Whenever the app recognizes an image, a modal with additional information about the piece and the artist is loaded.
  3. Image is marked as ‘visited’ and the image is not triggering the modal to reopen.

Cool? Then let’s start!

It’s coding time!

Technical requirements

  • iOS 11.3 or higher (ARKit 1.5+)
  • Xcode 9.3 or higher
  • Swift 4 or higher
  • Physical device (iPhone / iPad) with back camera & iOS 11.3 (or higher) for testing purposes.

Project Setup

First, open Xcode and choose ‘Create a new Xcode project’ to get started with an empty project.

Open Xcode and choose ‘Create a new Xcode project’.

Select ‘Augmented Reality App’ to start with a basic AR setup including a preconfigured ARSKView — the camera view in ARKit on which you can start building your own implementation logic.

Select ‘Augmented Reality App’.

As we are not playing with any 3D graphics, choosing the 2D SpriteKit as content technology is more than enough.

Select the 2D ‘SpriteKit’ ad content technology.

This will create a skeleton project for you with the following files:

The files that have been created for you by the AR skeleton.

The image recognition is only available from iOS 11.3 onwards, so let us go to your project overview

Click on the project overview.

And select iOS 11.3 (or higher) as required deployment target.

Select iOS 11.3 as deployment target.

In order to test that everything was set up right, connect and select your physical device and run the current version of the app

Select your physical iOS 11.3 device and run the app.

You will see the required prompt for access to your camera, that you need to accept in order to continue.

Accept the prompt for your camera access.

The AR Scene will start with a full-screen camera view. The out-of-the-box implementation comes with a nice little feature: On tap, a 2D alien emoji is going to be placed in the location of the phone.

The active viewport with added 2D markers.

Now your basic AR project is fully set up for you and we can continue with this tutorial.

Adding images to be recognized

One current limitation of ARKit’s image recognition is that all images that should be ‘scannable’ by the app need to be added to the app’s assets and bundled with it (I’ll explain in a bit the reason for it).

In order to do this, go to your project tree view and go to the assets folder by clicking ‘Assets.xcassets’.

Your project tree view, where you find your assets folder.

In the Assets folder, Click ‘+’ in the lower left corner of the first pane and then add a ‘New AR Resource Group’.

Add new ‘New AR Resource Group’.

Apple recommends to add a new group for a small subset of all scannable images in your application to manage overall computing complexity. For a museum this could for instance be a room or wing of a museum.

I added the AR Resource Group Mona Lisa Room, which holds one image, the Mona Lisa.

After doing this, you’ll see the following error, stating that the AR reference image needs a non-zero, positive width:

Error showing that the Mona Lisa needs positive dimensions.

Let’s find out why we need to do that and what that means for our application.

Excursion ARKit Image Recognition

Image recognition and computer vision in general are very complex topics. Ever since the computer was invented, it has been the dream of researchers and computer scientists to make computers appear more human–like.

Part of this is endeavor is thinking and acting like a human (Machine Learning — the brain), but in order to make that work, the brain needs data to process. And where does it get the data from? From IO sources, such as Natural Language Processing, image recognition and, ultimately, computer vision (the eyes & ears).

(Probable) Heuristics of ARKit’s Image Recognition Algorithm

Complex algorithms are often solved by applying heuristic–a short-cut way of solving the underlying problem in an imperfect, yet good enough manner.

Why is that important? Computing intensity directly correlates with tangible things, such as speed and ultimately the user experience, as well as more practical implications, such as the battery life of the user’s iPhone.

I was positively surprised to see that Apple made this early version of image recognition available and released a version of the algorithm with capabilities that are not yet perfect, but good enough for many use-cases already.

Most AR libraries work quite similarly and use the following IO process:

Camera > Frame > Algorithm > Result

Apple imposed the following restrictions for the current iteration of image recognition

  • Needs to be a semi-square image,
  • that is bundled in the App,
  • and has information about actual physical dimensions.

Furthermore, it is stated by Apple that

  • images with high contrast work best for image detection and
  • images should be on flat surfaces for a better detection.

I assume that the algorithm (Camera > Frame > Algorithm > Result) of the image recognition capability in ARKit looks something like that:

  1. Is there a square in the image? (Note: Square discovery is one of the more simple cases in computer vision.)
  2. If a square is discovered, do I know of an image that a) is in my bundled resources, b) that has approximately the real physical dimensions of the square that I just scanned and c) has not yet been scanned / marked as scanned?
  3. If so, I’ll take whatever is in the square and compare it with an image comparison algorithm (such as Perceptual Hash) if it matches?
  4. Add marker to the image and mark as ‘seen’ if one of the images matches.
  5. Trigger the action / code that was specified by the developer.

What does that mean for the museum app?

Scanning an image that is much bigger / smaller than specified in the actual dimensions wont work and not trigger the specified action.

Therefore, it is not sufficient to scan just any picture of Mona Lisa, but a Mona Lisa that has more or less the specified dimensions.

Adding images to be recognized (continued)

Now that we know why we need to have the dimensions of the image, go ahead and add them to the image of Mona Lisa via the ‘Attributes inspector’ in the far right pane (using the actual size of your printed out test image in centimeters).

Add the actual physical dimensions of your test image.

Adding image trigger

So now that the initial setup is done, we can let the ARKit engine know that our images exist, so that we can act upon a detection.

Open the ViewController.swift file and change the viewWillAppear lifecycle function to the following:

Here’s what we do:

Get a conditional reference to the images in the AR resource group we defined earlier and handle the case when the bundle can not be found:

guard let referenceImages = ARReferenceImage.referenceImages(inGroupNamed: "Mona Lisa Room", bundle: nil) else { 
fatalError("Missing expected asset catalog resources.")

Add those images to the ARWorldTrackingConfiguration, so the algorithm knows what images to look for:

configuration.detectionImages = referenceImages

And listen to any new nodes that are added via the following ARSKViewDelegate method — we will come back to this in a second.

func view(_ view: ARSKView, nodeFor anchor: ARAnchor) -> SKNode?

Open modal

Add a new View Controller to the storyboard as well as a new related ViewController Swift file to your project tree:

Add a new ViewController to your project tree.

Drag a new modal Segue between the main View Controller (AR viewport) and the newly created View Controller.

Add a modal segue between the main AR view and your newly created View Controller.

Give the segue an identifier in the inspector by naming it for instance showImageInformation, so it can be called from the View Controller.

Name your segue so it can be called from the View Controller.

Add the UI elements you want for your detail view, such as

  • a close button
  • the image name
  • the actual image
  • and a more detailed description.
Add all UI elements to your storyboard you want to have in your image detail controller.

Connect the created UI elements with your View Controller through outlets and initialize the elements with their actual values in viewDidLoad.

As a DTO (Data Transfer Object), I created a simple Struct called ImageInformation, that holds and encapsulates all necessary information in a single object (more information about this to come in a second), and placed in in the ViewController.

struct ImageInformation {
let name: String
let description: String
let image: UIImage

The simple ImageInformationViewController should now look something like this:

Going back to our initial ViewController and the delegate method we defined earlier:

func view(_ view: ARSKView, nodeFor anchor: ARAnchor) -> SKNode?

This delegate method is automatically triggered every time a new node should be added to the view.

We told the AR configuration our detection images via

configuration.detectionImages = referenceImages

and the delegate method will automatically be called each time an image from our reference images has been successfully detected.

Let’s look at what we’re doing in detail:

This is what happens inside the method:

  1. Is the anchor this delegate method is called for an ARImageAnchor?
  2. If so, get the name of the reference image.
  3. Do I have an image with that name in my images collection?
  4. If so, set my selected image in the controller and open the detail ViewController by calling the modal segue we created earlier.
  5. Also call the private function imageSeenMarker() that creates, returns and adds a SKLabelNode () to the AR view, so the user sees which images have been scanned before.

If a an image can be successfully casted and found in my images controller’s images collection (which I created as a quick solution):

let images = ["monalisa" : ImageInformation(name: "Mona Lisa", description: "The Mona Lisa is a half-length portrait painting by the Italian Renaissance artist Leonardo da Vinci that has been described as 'the best known, the most visited, the most written about, the most sung about, the most parodied work of art in the world'.", image: UIImage(named: "monalisa")!)]

Then the segue will be triggered. If not, nothing happens for that particular anchor.

On success, the delegate method prepareForSegue is called, where we cast and add the ImageInformation DTO to our ImageInformationViewController and the modal is opened.

Try it yourself by running the current state and scanning your test image.

Everything should work and holding the camera over your test image should trigger the modal to open.

All good, right? But wait — why is the image of Mona Lisa Black & White in the modal view?

I was surprised by this as well first — but then I realized that this is yet another shortcut by Apple’s ARKit developers adding to the heuristics of the image recognition. By storing the reference image in B&W, the image hashing algorithm will be less error prone and dependent on things, such as lightning.

You could fix this for instance by adding a full color reference yourself to the ImageInformation struct. I left it out for now, as this was not the focus of this tutorial.

Mark as visited bug

If you now close the modal view by clicking the ‘close’ button you added to the view, you will see that every time the camera discovers the reference image it will re-trigger the modal opening and adds a new ✅ every time. We obviously don’t want this.

But why is this? Normally the AR session should keep track of all previously added anchors itself?

By opening and closing the modal view, we trigger viewWillDisappear and viewWillAppear every time and thus creating a new ARWorldTrackingConfiguration each time.

(Somewhat dirty) quick fix for this: Get rid of session.stop() in viewWillDisappear of ViewController and move the code that has previously been in viewWillAppear to the controller’s initializer method viewDidLoad, that is only called once:

Now everything should work as initially specified and the final View Controller should look like this:


Finished app

The first version of the app is now done and should look like this:

The test code of the final project can be found and downloaded here (GitHub).

Questions and suggestions in the comments are always welcome and I try to get back to you as fast as possible.

Lukas Ingelheim is co-founder of Berlin-based Tech, UX and Product consultancy & company builder code & co.

Check out for more things we do in cutting-edge technologies, such as Augmented Reality, advanced mobile use-cases and Machine Learning.