How ARKit could save you at the airport

Published in

code.kiwi.com

12 min readMay 9, 2018

Back in the summer of 2017 when ARKit was all the rage, Kiwi.com’s Mobile Team was on board the hype train too. We were trying to think of a cool new feature to create and I was tasked with developing it.

Solving a problem, not creating one

ARKit is very good at two things — placing virtual objects on real surfaces to look as if they’re really there, and measuring distances in the scene. The picture below illustrates both of these things very well.

Many developers who’ve played with this library have used these features to their advantage and created awesome things along the way including measuring apps and games. However, these apps usually provide very similar and universal functionality. Because they try to do everything at once, they differ only in appearance but not features.

We wanted something that would fit into the flow of our app and that would be of value to our users. As you probably know, our app is mostly used to sell airline tickets.

The typical use-case is when a traveller finds a destination they’d like to visit, they choose a date and a price before buying a ticket. Then, they begin dealing with the usual stuff like how to get to the airport, what to pack and what to pack it into.

Do you remember the first time you were packing for a flight? In my case, I spent more time Googling the luggage allowance and the airline’s baggage dimensions than on the packing itself.

Wouldn’t it be great if an app could guide you by saying “Hey, you’re flying with Ryanair. Do you want to check if your luggage fits their allowance?”

That’s the main problem we’re trying to solve — telling the user if they need to book checked baggage or not. We already know what their allowance is, so we can provide a simple way for the user to measure their baggage.

The following two things needed to be done:
1. Learn how ARKit works
2. Invent a way of measuring bags

The following section describes both of these things.

Placing virtual objects in real world

If you’re not an iOS developer or you don’t want to know how to start developing ARKit apps, you can skip this section. Otherwise, read on and learn how easy it is to start working with ARKit.

Good, you stayed. So, how do we go about putting virtual objects on the floor? You will need two things. A library for tracking the phone’s position in the real world and a library to render virtual objects on top of the camera’s image.

On iOS, ARKit handles all the tracking and analysing. For rendering, you have a choice. If you’re a masochist, you can use your own rendering engine, or something that wasn’t designed with AR in mind. If you’re adventurous, you can try Unity or Unreal Engine with ARKit plugins. If you want to stay comfy (like us) and want to develop your app quickly, you can use another built-in Apple library — SceneKit.

SceneKit and its delegate

SceneKit is a 3D rendering library optimised to work seamlessly with ARKit. As a developer, you can interact with it by using a simple interface — or in iOS terms, by implementing its delegate.

There are two basic building blocks in SceneKit. The first one is ARSCNView or SCNView which represents the user interface object in which the scene is rendered along with its:
- Session (every time you run an app, a new session is started)
- Configuration (scene, rendering settings, etc)

The ARSCNView acts as a gateway for the setup of the whole rendering pipeline. After instantiating it, you simply set its variables to settings you want to use and let the library handle the rest. You don’t have to worry about game loops either, just use the session methods run and pause and the library will handle the rest.

let sceneView = ARSCNView()init() {    sceneView.scene = SCNScene()    sceneView.antialiasingMode = .multisampling4X    sceneView.autoenablesDefaultLighting = true    sceneView.scene.rootNode.addChildNode(box)}override func viewWillAppear(_ animated: Bool) {    super.viewWillAppear(animated)    sceneView.session.run(sessionConfiguration)    sceneView.session.delegate = self}override func viewWillDisappear(_ animated: Bool) {    super.viewWillDisappear(animated)    sceneView.session.pause()}

The second building block is a SCNNode object. Everything contained in the scene, and even the scene itself is represented as an instance of this object. Each of these instances can have other properties. These include a 3D model (Geometry in SceneKit terms), the object’s position, its children, etc. Describing objects as nodes is the core concept of a scene graph — a data structure behind the SceneKit’s logic used to easily manage 3D scenes and their content.

Imagine a scene with a racing car. When the car moves, you want the wheels to rotate, but also move with the car’s body. Normally, you would have to apply the move operation to the body and each wheel separately. If you use the scene graph principle and set up the car as one node, having the body and the wheels as child nodes, you can apply the move operation just to the main node representing the car and all of its parts. This will move the whole car, including the wheels. Then, you only need to rotate the wheels according to the car’s speed.

You also have to implement the scene delegate to use the augmented reality features of ARKit. This delegate consists of functions which you use to position virtual objects to anchors. Anchors are special nodes that can be used to position virtual objects in a way that will make them look like they’re sitting in the real world. These anchors are usually horizontal or vertical surfaces, such as floors, tables or walls.

ARKit and SceneKit work together automatically in this configuration. If ARKit finds anchors in the real world, it adds them to the scene. SceneKit then calls the following functions to inform you about this so that you can react accordingly.

func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {    ...}

This first function is called when ARKit finds a surface in the scene that can be used as an anchor and added to the scene. You can use these objects to position your virtual content in the real world. You can visualise these anchor nodes by adding geometry to them.

func renderer(_ renderer: SCNSceneRenderer, didUpdate node: SCNNode, for anchor: ARAnchor) {    ...}

The second function is used to reposition existing anchors. If the device loses its orientation for some reason (usually due to bad lighting or camera obstruction) and then later finds it again, this function is called with the updated positions. You should use these new positions to move the anchors and objects pinned to them.

func renderer(_ renderer: SCNSceneRenderer, didRemove node: SCNNode, for anchor: ARAnchor) {    ...}

When you move to a completely different place or the environment changes a lot, this function is called to let you know that an anchor is no longer available and you should start using a different one.

func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {    ...}

The last function provides you with the possibility to visualise anchors found in the scene. You can provide a SCNNode to visualise each anchor. I did something similar with the didAdd function. These two functions are exclusive. If you implement both of them, the nodeFor anchor function won’t be called.

Visualising anchors is convenient when the user is supposed to place a virtual object in the real environment manually.

After this required setup, you can start implementing your content and setting up the scene.

ARKit and its delegate

ARKit does much of the hard work automatically, as mentioned before.

Its configuration is set in the ARConfiguration object in which you can decide what kind of surfaces you want to detect, and various other things. This configuration is then sent to the ARSession hidden in the ARSCNView. Everything happens behind the scenes.

However, if you want to use more advanced functions or if you want to access the raw data and play with it yourself, you have to tap into ARKit’s delegate function. Yes, there’s only one.

func session(_ session: SCNARSession, didUpdate frame: ARFrame) {    if let features = frame.rawFeaturePoints {         …    }}

This function provides you with everything that ARKit can detect in the environment. What you do with the data is up to you. You have access to the captured image from the camera, anchors, and most interestingly, the point cloud.

Point clouds are objects represented by points in a 3D space. These points are usually called features. ARKit can detect interesting features in the environment and find surfaces. Features can also be used to measure distances and detect objects. This is what I used to analyse the scene and measure bags.

Measuring objects in a more intelligent way

Drawing a box on the floor is not a difficult task with ARKit. But we wanted something more automatic. Something that would guide the user and also report the results of the measurement.

We discussed various ways of providing the user with the tools to measure their bag. The possibilities ranged from a virtual ruler, to letting the user draw a box around the bag, to just displaying the dimensions around the bag.

There are already lots of virtual rulers on the AppStore, but drawing a box is not easy on a small screen with a moving picture. To display dimensions automatically, I would have to use more advanced computer vision techniques, which would probably lead to more problems than benefits. It would certainly also take longer to implement.

Instead, we decided to simulate a real measuring box for the user to place their bag into — similar to what they have at airports.

The box can adjust in size depending on the trip’s itinerary, or the user can choose the size manually. It can also detect if the user has inserted the bag into it. The main feature is that the box can detect if the bag is sticking out in some way and therefore if the bag is the right size.

So, how did I make the box this smart?
Firstly, after placing down the virtual box with ARKit, I needed to check if the user already put something into it, so that I wouldn’t just measure an empty space.

Now, the great thing about using SceneKit and ARKit together is that the scene and ARKit data use the same coordinate system. This is very important if you want to connect the real world with the virtual one. Of course, you can always use some mathematical magic to convert the coordinates yourself. Fortunately, in this case we didn’t need to.

As mentioned in the previous section, ARKit offers a great way of analysing a real-world scene with features from a point-cloud.

You can use these features to check if an object is occupying a spot you’re interested in. In this case, if there are some features found inside the virtual box, it usually means that something is placed inside. If something is inside, the user can start measuring it.

Up to this point, everything was just complementary to provide a better user experience. Now, for the main goal of this endeavour, the measurement itself.

I had to invent a way to check if the bag is sticking out of the measuring box. I already used the point cloud features once to check if the bag is even in the box, so why not to use them again to check if the bag is outside the box at the same time?

There’s one problem, though. The point-cloud contains all of the features found in the scene, not just the bag’s features. So, how do you distinguish a bag from a nearby plant, or a dog that happens to runs past.

Detected features on a bag and the surrounding environment

I solved this issue by assuming that if the features belong to a bag, they will be close to the measuring box: everything else is just noise. If the user placed the bag inside the box before (and this has already been checked), it can be safely assumed that the box is still there.

The only requirement with this approach is that the user needs some empty space around the measured bag. It can’t be done near a wall or another object.

The measuring process:
After placing the bag inside the visible virtual box, a bigger invisible box is created around it. The visible one is used to guide the user and represents the true dimensions used for measurement. The bigger invisible one is used to filter out the point cloud features.

Consequently, features which do not the occupy space inside this bigger box are thrown away. Features which are inside the visible smaller box are marked as features of the bag, which are safely inside the measuring box.

Detected features between the visible measuring box (yellow) and the invisible surrounding box (teal)

The magic comes from the features which are between these boxes. I assumed that these features are still a part of the bag but they don’t fit inside the measuring box. Therefore, the bag is probably sticking out somewhere.

Now the app knows whether or not the the bag fits in the box. However, simply telling the user “Yes it fits” or “No, it doesn’t fit” is not very user-friendly because in a lot of cases, the user might have placed the bag incorrectly, so repositioning it would help.

Because of the coordinate system used in both ARKit and SceneKit, checking which side is closest to the obstructing feature is a simple task. The sides of the measuring box where the bag sticks out can be highlighted in a different colour like in the diagram below.

Highlighting obstructed sides tells the user where to look for the problem or how to reposition the bag.

Results and future work

Below, you can watch a video of our demo app of this feature:

If you watched the video, you would have noticed that the box changes colours from blue to green or red. Blue means that the side was not checked yet and the user should walk around it. Red means that something is sticking out and green means that it fits.

As you can see, it works. However, sometimes, a side flashes from green to red and back. Sometimes, there are false positives, or false negatives. All of this is due to the inaccuracies in the point cloud features. Unfortunately, until ARKit becomes more precise, there is nothing that developers can do about it.

That is the main reason why we didn’t release this feature on our app in the end. It can be used for fun and as a gimmick but if a user wants to truly know if they’ll be let on board with their bag, it’s too inaccurate. We don’t want our app to be misleading and to give false information.

What about Android?
My colleague also tried out Tango on Android. It is a library that uses specialised hardware on certain devices to allow the creation of augmented reality apps. The main advantage of this versus a software solution like ARKit is that it offers more accurate data. It also offers a depth map.

The depth map can be used to create even better visual impressions of virtual objects by mapping them to the Zbuffer. If the virtual objects are obstructed by real ones, they can be hidden, which is something ARKit cannot currently do.

*Live Tango demo. AR without hiding obstructed virtual objects (left) and with hidden obstructed virtual objects (right)*

The main disadvantage is that the phones with this special hardware are not widespread at all and only a few users could enjoy this feature.

Some time ago, Google also released ARCore, which is a library for Android that’s very similar to ARKit but it doesn’t require special hardware. This would be a more viable option for us to explore in the future.

Also, Apple keeps releasing new versions of ARKit. These new versions include the possibility of detecting vertical planes (walls) and detection of pre-defined images in the scene. We’ll keep updating our demo app with these new features and maybe in the future, it will be made available for you all.

Until then, keep enjoying these new technologies, try them for yourself and think of new ways of using them.

If you want to keep up with Kiwi.com Mobile Team, join their meetup group to find out what we’re up to. You can also check out our open Android and iOS positions.