Using Machine Learning and CoreML to control ARKit

Published in

S23NYC: Engineering

14 min readJan 9, 2019

Combining image classification and augmented reality to create new experiences

So far in our AR journey we’ve gone through the basics of putting objects in AR, controlling them via animations, detecting planes and placing items with hit detection and have explored various ways of visualizing algorithms in 3D space. Let’s now delve (albeit lightly) into the large, and complex world of machine learning.

A popular form of experimenting with image classification has been to recognize hand gestures. We’re going to take that concept a bit further today. We’ll begin by training a simple model, importing it to CoreML and the Vision framework to classify a couple of hand poses, and then use that to control a 3D model in ARKit.

We’re going to touch on a single area of interest when building our machine learning model, namely, image classification. While there are several ways of training models such as TensorFlow, Keras, and even Xcode’s own training interface, we’re going to use a free online service that Microsoft has created called Custom Vision. It’s easy to get started and for prototyping ideas quickly without needing to do much prep work or code. Go ahead and create an account on https://customvision.ai and explore the interface.

What we’ll need to do before we create our project in the Custom Vision dashboard is to begin taking pictures of our hand. We’ll need to take 3 sets of images, one for an open hand, one for a closed fist, and the last being no…

Using Machine Learning and CoreML to control ARKit

Written by Dan Wyszynski