ARKit: Tool or Toy

Rendering 3D AR Models from Real Objects

Tool or Toy?

ARKit demos are bringing animations into our reality. What if we could do the opposite and map our reality into AR? For ARKit to work, it must scan and understand our surroundings. If we can collect that data, we could create virtual 3d models. Let’s give it a try. Join Mikaela Goldrich, Jeff Wolski, and I on this wild experiment with ARKit.


We’ll tackle this experiment with a first principles approach. We’ll identify what we know and build from there.

3d models

All 3d models start with vertices. Vertices are geospatial points. That is to say, each point has an x, y, and z coordinate that denotes its location. Connecting two vertices will give you an edge. Connecting edges will render faces. These faces define the shape of your 3d model. However, you must be careful. Depending on how you connect your vertices, you’ll get different models.

If we can collect vertices and tell them how to connect to each other, we should be able to generate a 3d model. To connect them we’ll need an algorithm. The algorithm should be able to take vertices and output faces. We should then be able to plot these faces into our surroundings.

Algorithm time

Let’s take a simple algorithm to connect our points, such as Quickhull. Given a set of points, Quickhull will compute the smallest convex polygon containing the given points. At its core, the algorithm takes a divide and conquer approach similar to quicksort.

By Maonus (Own work) [CC BY-SA 4.0 (], via Wikimedia Commons

Let’s be real, applying a generic algorithm to our points will most likely render decent results. We’ll want to feed the algorithm as accurate vertices as we can. The more accurate the vertices, the better the result.

3d modeling software such as MeshLab use a combination of algorithms to create models from vertices. However, Quickhull should be able to provide us a proof of concept for this experiment.


The reason ARKit works with existing Apple hardware going back to the iPhone SE is because it uses black magic… eh, it uses existing technology to process your surroundings. Before ARKit, most augmented reality frameworks required multiple cameras to get a hold of depth perception. ARKit works differently. It uses visual-inertial odometry, which is the determination of position and orientation by comparing associated images.

To scan your scene, ARKit requires you to point your camera around the room. Meanwhile, ARKit is scanning and capturing frames. By combining the frame data with your device’s motion detection hardware, such as the accelerometer, ARKit starts to identify where in space your frames exist.

For ARKit to interact with your scene, it needs to be able to understand its contents. It does so through hit tests. A hit test will analyze a specific point in space and return a vector coordinate (x,y,z) of where it believes the point lies in your scene. The hit test may be computed multiple times, each time with a more accurate result. The more hit tests ARKit computes, the more of the scene it understands. Once ARKit starts understanding the contents of your scene, it will begin to identify planes by which it can interact with and place objects.

The process

If we want ARKit to create a 3d model of an object, we’ll need it to understand the scene first. It can then start performing hit tests to gather our vertices. From there, we’ll be able to run our algorithm on these vertices to create a 3d model. To summarize:

1. Scan the scene
2. Collect vertices
3. Run Quickhull on our vertices
4. Use the output to create our 3d model
5. Profit!

The code

We’ll be using `Xcode 9.0` together with `Swift 4.0`. It is important to note ARKit works with iOS 11 and up. We’ll go through a high level overview of the code.

Scan the Scene

SceneKit and ARKit are highly intertwined and work well together. Models created in SceneKit can be rendered in ARKit. These models are SCNGeometry objects in SceneKit. The basis for everything ARKit related lies in your ARSCNView. We’ll call ours sceneView. Through our sceneView, we’ll analyze our surroundings, gather hitTest results, and place our models within after they have been generated.

On launch, our sceneView will start scanning and processing the frames of our surroundings. It’ll take a moment for SceneKit to get its bearings. Once it does, we can start interacting with our scene and begin our hit tests. Successful hit tests will return feature points.

Let’s take a look at the feature points our scene is gathering by activating our sceneView debugger.

// Show feature points
self.sceneView.debugOptions = ARSCNDebugOptions.showFeaturePoints
// Hide feature points, more so reset the debugger options
self.sceneView.debugOptions = []

As you move your phone around you’ll start to see a lot of yellow points being drawn! You’ll also notice that they disappear. These points are being drawn for a given frame. Once our sceneView decides that the new set of frames are no longer are applicable to the past ones, it releases them from memory. Also, notice that on the bottom of the screen ARKit is providing us with debugging tools. Take note at the frame rate (fps). Drawing too many of these feature points will drop your fps.

After playing with the feature points, you’ll start to notice that the detection works better on textured surfaces. Too shiny surfaces will confuse ARKit. One trick is to spray shiny surfaces with water, but this is cheating :)

Collect Vertices

These features points could do the trick for our collection of vertices. However, we wouldn’t want to collect all of them, there would be too many stray points that do not correctly represent our model. We only want the points that ARKit finds directly on our object.

Stray points off the wazoo!

If we could select which points we wanted, say by swiping over them, we’d be able to be pretty selective with points we want. We can do so by implementing Swift’s UIGestureRecongizer: touchesMoved. Let’s store the touch location in currentPoint. We’ll guard against a bad touch by returning early.

override func touchesMoved(_ touches: Set<UITouch>, with event: UIEvent?) {
guard let touch = touches.first else { return }
let currentPoint = touch.location(in: sceneView)

Our currentPoint is of type CGPoint. Wait, but we need 3d points and CGPoint only has x and y! Don’t fret, Apple really thought this one out. The z coordinate gives us depth perception. While we can’t directly compute a z coordinate, we can identify where a feature point exists on our screen. If we can identify this 2d location on our screen, we can compare it to feature points that our sceneView has already drawn.

Our feature points contain x, y, and z coordinates that correlate to the space in our sceneView. These points are of type SCNVector. Our sceneView has a method projectPoint(), which will translate vector coordinates that pertain to our sceneView into coordinates that pertain to our device’s screen. Therefore, we’ll be able to check if a given any feature points exists where our touch fell on the screen.

// Get all feature points in the current frame
let fp = self.sceneView.session.currentFrame?.rawFeaturePoints
guard let count = fp?.count else { return }
// Create a material
let material = createMaterial()
// Loop over them and check if any exist near our touch location
// If a point exists in our range, let's draw a sphere at that feature point
for index in 0..<count {
let point = SCNVector3.init((fp?.points[index].x)!, (fp?.points[index].y)!, (fp?.points[index].z)!)
let projection = self.sceneView.projectPoint(point)
let xRange:ClosedRange<Float> = Float(currentPoint.x)-100.0...Float(currentPoint.x)+100.0
let yRange:ClosedRange<Float> = Float(currentPoint.y)-100.0...Float(currentPoint.y)+100.0
if (xRange ~= projection.x && yRange ~= projection.y) {
let ballShape = SCNSphere(radius: 0.001)
ballShape.materials = [material]
let ballnode = SCNNode(geometry: ballShape)
ballnode.position = point
        // We'll also save it for later use in our [SCNVector]

Take a look at how we are creating a SCNNode. SCNNode takes in a geometry object as input. SCNSphere subclasses SCNGeometry, which allows us to create a sphere in our scene, such as the yellow spheres that we have previously seen. We’ll also save our point in a pointCloud for later use.

To give our geometry a texture, we can add materials. Our createMaterial() method returns a blue material with a bit of transparency.

func createMaterial() -> SCNMaterial {
let clearMaterial = SCNMaterial()
clearMaterial.diffuse.contents = UIColor(red:0.12, green:0.61, blue:1.00, alpha:1.0)
clearMaterial.locksAmbientWithDiffuse = true
clearMaterial.transparency = 0.2
return clearMaterial

After some good swiping, we’ll have a pretty nice point cloud drawn out. You’ll also notice, the more points you draw, the lower your fps gets.


We’ve collected our of vertices and are ready to feed our algorithm. We decided to borrow Mauricio Poppe’s version of the algorithm. Yes, it is written in Javascript. No, it’s not a big deal :) If react-native does it, why can’t we?

public func quickHull3d(vertices: [Array<Float>]) -> [Array<Int32>]? {
let frameworkBundle = Bundle(identifier: "com.research.arkit")
if let quickHullModulePath = frameworkBundle?.path(forResource: "quickhull3d", ofType: "js") {
let quickHullModule = try! String(contentsOfFile: quickHullModulePath)
let jsSource = "var window = this; \(quickHullModule)"
let context = JSContext()!
let algo = context.objectForKeyedSubscript("quickhull3d")!
let result = [vertices])
return result!.toArray() as? [Array<Int32>]
return nil

We used browerify to bundle the algorithm. Once bundled, we can access its methods globally by calling objectForKeyedSubscript on our JSContext. This context is a Javascript environment where our Javascript will run (very similar to a virtual machine). Notice that we are passing in [Array<Float>] not [SCNVector] . Our Javascript algorithm doesn’t understand what an SCNVector is so we’ll need to transform our pointCloud before passing it through.

func reduceVectorToPoints(given vertices: [SCNVector3]) -> [Array<Float>] {
var points = [Array<Float>]()
for vertex in vertices {
var vertexArray = [Float]()
return points

We now have an [Array<Int32>] which contains our faces! The output of our algorithm will give us something along the lines of:
let faces = [ [ 2, 0, 3 ], [ 0, 1, 3 ], [ 2, 1, 0 ], [ 2, 3, 1 ] ]

Woah, what do these numbers mean? They are the indices of our points. Three points make a face. For example face[0] tells us to connect pointCloud[2] to pointCloud[0] to pointCloud[3] … and so on.

We’ve done it! We can create our SCNGeometry objects from our pointCloud and faces.

The blue surfaces are the faces that were rendered from the point cloud. Stray points confused Quickhull.


Well, this is where the 💩 hit the ☢ . Unfortunately, our results were less than ideal. But why? Where did we go wrong!?

The short and sweet:

1. Our algorithm wasn’t able to recognize stray points and filter them out.

One solution would be to clean up our point data before running Quickhull against it. Doing so would allow us to remove any outliers.

2. Convex vs Concave objects, understanding holes in our objects

Quickhull can compute convex hulls. It will not be able to compute most concave hulls. Therefore the items that we scan that are mostly convex will be more accurate.

Back to the Point Cloud

The point cloud by itself ended up rendering nicely. Amazingly, the scale was accurate as well.

Yes, it’s 2:40 am 😑

Exporting & Testing

The point cloud seems pretty accurate, however it looks like it was our algorithm which was lacking. Let’s export our data into MeshLab to see if we can analyze this further. In this scan, we’ll analyze a table in our office.

White Polygon triangulation of yellow points.

MeshLab was able to process and triangulate the faces better then we were with Quickhull. However, we are still running into the issue of stray points. If we want to generate serious models from ARKit, we’ll need to cleanup our points and use a different polygon triangulation algorithm.

Use Cases: Tool or Toy?

ARKit definitely packs a punch. It is a new, well thought-out framework that bridges AR and iOS devices. The framework’s API plays well with SceneKit and other standard Swift Frameworks. The visual-inertial odometry is accurate enough to produce realistic results.

Since ARKit plays so well and uses the same standards as other 3d modeling libraries, you can export data from ARKit easily and reuse it on other platforms. The ability to capture real objects and render 3d models from the ease of your mobile device opens up tremendous opportunities.

So can ARKit be used as a tool? Yes it can. As developers, we can now interact with our surroundings like never before!