# ARKit: Tool or Toy

## Rendering 3D AR Models from Real Objects

ARKit demos are bringing animations into our reality. What if we could do the opposite and map our reality into AR? For ARKit to work, it must scan and understand our surroundings. If we can collect that data, we could create virtual 3d models. Let’s give it a try. Join Mikaela Goldrich, Jeff Wolski, and I on this wild experiment with ARKit.

### Research

We’ll tackle this experiment with a first principles approach. We’ll identify what we know and build from there.

#### 3d models

All 3d models start with vertices. Vertices are geospatial points. That is to say, each point has an x, y, and z coordinate that denotes its location. Connecting two vertices will give you an edge. Connecting edges will render faces. These faces define the shape of your 3d model. However, you must be careful. Depending on how you connect your vertices, you’ll get different models.

If we can collect vertices and tell them how to connect to each other, we should be able to generate a 3d model. To connect them we’ll need an algorithm. The algorithm should be able to take `vertices` and output `faces`. We should then be able to plot these faces into our surroundings.

#### Algorithm time

Let’s take a simple algorithm to connect our points, such as Quickhull. Given a set of points, Quickhull will compute the smallest convex polygon containing the given points. At its core, the algorithm takes a divide and conquer approach similar to quicksort.

Let’s be real, applying a generic algorithm to our points will most likely render decent results. We’ll want to feed the algorithm as accurate vertices as we can. The more accurate the vertices, the better the result.

3d modeling software such as MeshLab use a combination of algorithms to create models from vertices. However, Quickhull should be able to provide us a proof of concept for this experiment.

#### ARKit

The reason ARKit works with existing Apple hardware going back to the iPhone SE is because it uses black magic… eh, it uses existing technology to process your surroundings. Before ARKit, most augmented reality frameworks required multiple cameras to get a hold of depth perception. ARKit works differently. It uses visual-inertial odometry, which is the determination of position and orientation by comparing associated images.

To scan your scene, ARKit requires you to point your camera around the room. Meanwhile, ARKit is scanning and capturing frames. By combining the frame data with your device’s motion detection hardware, such as the accelerometer, ARKit starts to identify where in space your frames exist.

For ARKit to interact with your scene, it needs to be able to understand its contents. It does so through hit tests. A hit test will analyze a specific point in space and return a vector coordinate (x,y,z) of where it believes the point lies in your scene. The hit test may be computed multiple times, each time with a more accurate result. The more hit tests ARKit computes, the more of the scene it understands. Once ARKit starts understanding the contents of your scene, it will begin to identify planes by which it can interact with and place objects.

#### The process

If we want ARKit to create a 3d model of an object, we’ll need it to understand the scene first. It can then start performing hit tests to gather our vertices. From there, we’ll be able to run our algorithm on these vertices to create a 3d model. To summarize:

1. Scan the scene
2. Collect vertices
3. Run Quickhull on our vertices
4. Use the output to create our 3d model
5. Profit!

### The code

We’ll be using `Xcode 9.0` together with `Swift 4.0`. It is important to note ARKit works with iOS 11 and up. We’ll go through a high level overview of the code.

#### Scan the Scene

SceneKit and ARKit are highly intertwined and work well together. Models created in SceneKit can be rendered in ARKit. These models are `SCNGeometry` objects in SceneKit. The basis for everything ARKit related lies in your `ARSCNView`. We’ll call ours `sceneView`. Through our `sceneView`, we’ll analyze our surroundings, gather hitTest results, and place our models within after they have been generated.

On launch, our `sceneView` will start scanning and processing the frames of our surroundings. It’ll take a moment for SceneKit to get its bearings. Once it does, we can start interacting with our scene and begin our hit tests. Successful hit tests will return feature points.

Let’s take a look at the feature points our scene is gathering by activating our `sceneView` debugger.

`// Show feature pointsself.sceneView.debugOptions = ARSCNDebugOptions.showFeaturePoints`
`// Hide feature points, more so reset the debugger optionsself.sceneView.debugOptions = []`

As you move your phone around you’ll start to see a lot of yellow points being drawn! You’ll also notice that they disappear. These points are being drawn for a given `frame`. Once our `sceneView` decides that the new set of frames are no longer are applicable to the past ones, it releases them from memory. Also, notice that on the bottom of the screen ARKit is providing us with debugging tools. Take note at the frame rate (fps). Drawing too many of these feature points will drop your fps.

After playing with the feature points, you’ll start to notice that the detection works better on textured surfaces. Too shiny surfaces will confuse ARKit. One trick is to spray shiny surfaces with water, but this is cheating :)

#### Collect Vertices

These features points could do the trick for our collection of vertices. However, we wouldn’t want to collect all of them, there would be too many stray points that do not correctly represent our model. We only want the points that ARKit finds directly on our object.

If we could select which points we wanted, say by swiping over them, we’d be able to be pretty selective with points we want. We can do so by implementing Swift’s UIGestureRecongizer: `touchesMoved`. Let’s store the touch location in `currentPoint`. We’ll guard against a bad touch by returning early.

`override func touchesMoved(_ touches: Set<UITouch>, with event: UIEvent?) {    guard let touch = touches.first else { return }    let currentPoint = touch.location(in: sceneView)    ...}`

Our `currentPoint` is of type `CGPoint`. Wait, but we need 3d points and `CGPoint` only has `x` and `y`! Don’t fret, Apple really thought this one out. The `z` coordinate gives us depth perception. While we can’t directly compute a `z` coordinate, we can identify where a feature point exists on our screen. If we can identify this 2d location on our screen, we can compare it to feature points that our `sceneView` has already drawn.

Our feature points contain `x`, `y`, and `z` coordinates that correlate to the space in our `sceneView`. These points are of type `SCNVector`. Our `sceneView` has a method `projectPoint()`, which will translate vector coordinates that pertain to our `sceneView` into coordinates that pertain to our device’s screen. Therefore, we’ll be able to check if a given any feature points exists where our touch fell on the screen.

`...// Get all feature points in the current framelet fp = self.sceneView.session.currentFrame?.rawFeaturePointsguard let count = fp?.count else { return }`
`// Create a materiallet material = createMaterial()`
`// Loop over them and check if any exist near our touch location// If a point exists in our range, let's draw a sphere at that feature pointfor index in 0..<count {    let point = SCNVector3.init((fp?.points[index].x)!, (fp?.points[index].y)!, (fp?.points[index].z)!)    let projection = self.sceneView.projectPoint(point)    let xRange:ClosedRange<Float> = Float(currentPoint.x)-100.0...Float(currentPoint.x)+100.0    let yRange:ClosedRange<Float> = Float(currentPoint.y)-100.0...Float(currentPoint.y)+100.0    if (xRange ~= projection.x && yRange ~= projection.y) {        let ballShape = SCNSphere(radius: 0.001)        ballShape.materials = [material]        let ballnode = SCNNode(geometry: ballShape)        ballnode.position = point        self.sceneView.scene.rootNode.addChildNode(ballnode)`
`        // We'll also save it for later use in our `[SCNVector]`        self.pointCloud.append(point)    }}`

Take a look at how we are creating a `SCNNode`. `SCNNode` takes in a geometry object as input. `SCNSphere` subclasses `SCNGeometry`, which allows us to create a sphere in our scene, such as the yellow spheres that we have previously seen. We’ll also save our point in a `pointCloud` for later use.

To give our geometry a texture, we can add materials. Our `createMaterial()` method returns a blue material with a bit of transparency.

`func createMaterial() -> SCNMaterial {    let clearMaterial = SCNMaterial()    clearMaterial.diffuse.contents = UIColor(red:0.12, green:0.61, blue:1.00, alpha:1.0)    clearMaterial.locksAmbientWithDiffuse = true    clearMaterial.transparency = 0.2    return clearMaterial}`

After some good swiping, we’ll have a pretty nice point cloud drawn out. You’ll also notice, the more points you draw, the lower your fps gets.

#### Algorithm

We’ve collected our of vertices and are ready to feed our algorithm. We decided to borrow Mauricio Poppe’s version of the algorithm. Yes, it is written in Javascript. No, it’s not a big deal :) If `react-native` does it, why can’t we?

`public func quickHull3d(vertices: [Array<Float>]) -> [Array<Int32>]? {    let frameworkBundle = Bundle(identifier: "com.research.arkit")    if let quickHullModulePath = frameworkBundle?.path(forResource: "quickhull3d", ofType: "js") {        let quickHullModule = try! String(contentsOfFile: quickHullModulePath)        let jsSource = "var window = this; \(quickHullModule)"        let context = JSContext()!        context.evaluateScript(jsSource)        let algo = context.objectForKeyedSubscript("quickhull3d")!        let result = algo.call(withArguments: [vertices])        return result!.toArray() as? [Array<Int32>]    }    return nil}`

We used browerify to bundle the algorithm. Once bundled, we can access its methods globally by calling `objectForKeyedSubscript` on our `JSContext`. This context is a Javascript environment where our Javascript will run (very similar to a virtual machine). Notice that we are passing in `[Array<Float>]` not `[SCNVector]` . Our Javascript algorithm doesn’t understand what an `SCNVector` is so we’ll need to transform our `pointCloud` before passing it through.

`func reduceVectorToPoints(given vertices: [SCNVector3]) -> [Array<Float>] {    var points = [Array<Float>]()    for vertex in vertices {        var vertexArray = [Float]()        vertexArray.append(vertex.x)        vertexArray.append(vertex.y)        vertexArray.append(vertex.z)        points.append(vertexArray)    }    return points}`

We now have an `[Array<Int32>]` which contains our faces! The output of our algorithm will give us something along the lines of:
`let faces = [ [ 2, 0, 3 ], [ 0, 1, 3 ], [ 2, 1, 0 ], [ 2, 3, 1 ] ]`

Woah, what do these numbers mean? They are the indices of our points. Three points make a face. For example `face[0]` tells us to connect `pointCloud[2]` to `pointCloud[0]` to `pointCloud[3]` … and so on.

We’ve done it! We can create our `SCNGeometry` objects from our `pointCloud` and `faces`.

### Results

Well, this is where the 💩 hit the ☢ . Unfortunately, our results were less than ideal. But why? Where did we go wrong!?

The short and sweet:

1. Our algorithm wasn’t able to recognize stray points and filter them out.

One solution would be to clean up our point data before running Quickhull against it. Doing so would allow us to remove any outliers.

2. Convex vs Concave objects, understanding holes in our objects

Quickhull can compute convex hulls. It will not be able to compute most concave hulls. Therefore the items that we scan that are mostly convex will be more accurate.

#### Back to the Point Cloud

The point cloud by itself ended up rendering nicely. Amazingly, the scale was accurate as well.

#### Exporting & Testing

The point cloud seems pretty accurate, however it looks like it was our algorithm which was lacking. Let’s export our data into MeshLab to see if we can analyze this further. In this scan, we’ll analyze a table in our office.

MeshLab was able to process and triangulate the faces better then we were with Quickhull. However, we are still running into the issue of stray points. If we want to generate serious models from ARKit, we’ll need to cleanup our points and use a different polygon triangulation algorithm.

### Use Cases: Tool or Toy?

ARKit definitely packs a punch. It is a new, well thought-out framework that bridges AR and iOS devices. The framework’s API plays well with SceneKit and other standard Swift Frameworks. The visual-inertial odometry is accurate enough to produce realistic results.

Since ARKit plays so well and uses the same standards as other 3d modeling libraries, you can export data from ARKit easily and reuse it on other platforms. The ability to capture real objects and render 3d models from the ease of your mobile device opens up tremendous opportunities.

So can ARKit be used as a tool? Yes it can. As developers, we can now interact with our surroundings like never before!

Like what you read? Give Ulises Giacoman a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.