iOS Motion Data + Machine Learning with Swift and R

Over the past few months, I’ve been working steadily through Andrew Ng’s fantastic Machine Learning course on Coursera as well as Machine Learning with R: Second Edition (a great book for learning both R as well as various useful machine learning techniques). With some free time over the holidays, I wanted to put some of what I’ve been learning to the test - I thought it would be fun to try play around with classifying motion/sensor data from an iOS device.

I decided to try something that I hoped would be fairly straightforward: predicting if an iPhone was being held “vertically” or “not vertically” based on sensor data alone — and overall, it worked pretty well. I used a pretty standard MLP feedforward neural network to classify sensor inputs as either “vertical” or “not vertical”. In this post, I’ll go through what I’ve built — but before that, here’s a video of the iOS app I built in action — when the phone is being held vertically (and the neural network is predicting the same), the background color is set to blue (otherwise, it’s set to orange to signify “non-verticalness”):

As an amusing aside — it turns out that one of the more difficult challenges in this exercise was actually taking a half-decent video of me holding my iPhone on my Nexus 9 tablet. I actually thought this part was going to be super easy, but it was deceptively hard. I’ll sum the experience up like this: I don’t often take videos, but when I do, they aren’t great.

So, before we start looking at code (which is all available here on GitHib, MIT-licensed), it’s worth breaking down the project down into a few high level pieces to get a sense of what was built:

  1. Collecting Training Data
  2. Training a Neural Network to Predict Orientation
  3. Evaluating Performance
  4. Predicting Orientation in an iOS app

Collecting Training Data

The first order of business — collecting some data to train the neural network with! Looking at the Core Motion API, and knowing all I really needed was enough information to determine the orientation, I selected the following features to train with, available via CMMotionManager’s deviceMotion’s attitude property (of type CMAttitude): roll, yaw, pitch, and the 4 quaternions Apple makes available: w, x, y, z.

Since I trained the neural network using the R neuralnet package, the best format for get training data into R for processing was just basic CSV — so in a test harness iOS app, I wired up “vertical” and “not vertical” buttons to simply print motion data to the Xcode console, with a 1 to signify a vertical training example, and a 0 to signify a non-vertical training example:

Copying and pasting the data into a CSV file was a breeze:

Training a Neural Network to Predict Orientation

Armed with a few hundred training examples in a CSV file, the next step was to train the neural network — but first, a few quick preparatory steps were in order:

I divided the data set of 752 total observations into 60% training data and 40% test data, used to evaluate the performance of the neural net. Looking at the training & test sets, they are both roughly comparable in terms of vertical/non-vertical training examples — in the training set, ~51% are non-vertical examples, and ~48% are vertical examples. In the test set, ~48% are non-vertical examples, and ~51% are vertical — a pretty decent distribution.

Training the neural network was pretty straightforward, though you may be interested in why I chose this particular architecture — 4 hidden layers: 28, 21, 14, 7 nodes each — which I’ll speak to in the next section. The function used to calculate error is just the sum of squared errors, and the activation function is the standard sigmoid function used in many feedforward networks.

Note that I did not normalize the training & test data before training the neural network — I originally started off doing that, but ultimately dropped it for 2 reasons: 1) it made predicting easier in Swift, which we’ll see in the last section and 2) the range of the data for each feature was very similar, rarely exceeding -2 <= x <= 2. A more thorough approach to this problem would likely include normalization, though.

Visually plotting it the network is just a simple call to plot(mmodel):

Evaluating Performance

So that diagram looks neat, but how well does this neural network actually do when it comes to the data it was trained on (the training subset), and how well does it generalize against data it hasn’t encountered yet (the test subset)? Turns out, pretty decently!

There are 3 things to look for in this gist — the first, the error on the neural network model itself — 0.004136, or 0.41% on the training data (that assumes I’m interpreting the error correctly, which I believe I am) — very low, so we did well against the training data.

Next, there is a strong correlation between the predicted orientations in our test set, and the actual orientations in our test set — 0.9585, very close to 1 — suggesting that we did well generalizing against novel test examples.

Finally, looking at a confusion matrix comparison (using the R caret package) between predicted orientations in our test set and actual orientations, the neural network only misclassified 6 examples out of 301 total examples — an error rate of only 1.9% — and an accuracy of ~98%. Not too bad! — though I suspect that with some work, the accuracy could be even higher. But for the purposes of this blog post & demo, 98% suits my needs just fine!

So now to answer the question I posed above — why choose a 4-hidden layer architecture, with 28/21/14/7 nodes? Well, to be honest, I mostly came to that architecture by trial and error, playing around with different architectures and looking at the correlation/confusion matrix results for each. Architectures with fewer hidden layers had trouble hitting 90% accuracy, and I found that accuracy increased substantially with more hidden layers. I suspect that I could get away with fewer layers with a more configurable neural network package (for example, one that supports regularization). At some point, I plan to covert my regularized neural network Octave code from Andrew Ng’s Machine Learning course to R, and that should give me a good proving ground — particularly as I’d like to be able to plot the kinds of learning curves Andrew Ng describes in the course.

Predicting Orientation in an iOS app

Now that we’ve got a trained neural network, how do we actually use it in an iOS app? The demo iOS app I built takes input sensor data (roll, yaw, pitch, qw, qx, qy, qz) and passes it through 4 hidden layers of the neural network — the weights trained by the neuralnet package — using the same feedforward logic as the R call to compute (predict) above — except it does it in Swift. Which means that first, before anything else, the Swift code needs to have a way to load in those weights — which further means that we need a way to get those weights out of mmodel, the trained neural network. So here’s how we do it:

All this code does is output weight vectors for each layer in the neural network, which I’ll explain momentarily.

This is what the output looks like: motion_nn.dat— which contains the copied & pasted output of print_nnweights(). It’s a flat file with each layer’s vector of weights on a single line (one line per layer of space-delimited doubles).

All the Swift app needs to do is load that flat file and parse each line, as shown in the full version of NeuralNetworkWeightFileProcessor.swift.

So what do I mean by “weight vectors”? Essentially, just a matrix unrolled into a vector. For example, consider the input layer of the above network (7 nodes + 1 bias node, for 8 total nodes) mapped to the 28 nodes in the first hidden layer:

The purple nodes represent the weights from each input node to the 28 nodes in the 1st hidden layer, and unrolled, the weight vector looks something like this:

Except there’s 224 purple dots in reality (8 input nodes * 28 nodes in the first hidden layer), and I just didn’t feel like copying and pasting that purple dot any more than I needed to. And that is all that motion_nn.dat contains! This is useful for two reasons — it’s easy to read from a flat file in Swift, but more importantly, the Swift library I’ll use for linear algebra (Apple’s Accelerate framework) work in terms of matrices unrolled into vectors.

We’re almost to the end! Let’s quickly look at the math we need to predict something with this neural network in Swift. We only need the math for feedforward passes through a neural network. For this particular implementation, the math works out to the following:

Feedforward Pass:

Where X is a vector of input data — roll, yaw, pitch, qw, qx, qy, qx, with bias unit set to 1 at the top).

Where ϴ1, ϴ2, ϴ3, ϴ4, ϴ5 are weight matrices for each layer → layer.

Where g(x) is the sigmoid (logistic) activation function:

Here’s a quick excerpt of what this looks like in Swift:

It’s certainly not the most beautiful or elegant code I’ve ever written (I’m sure there is a far cleaner way to vectorize the activation function), but it gets the job done for this demonstration :) For the full feedforward pass calculation, check out the full version of NeuralNetwork.swift.

And with that, we can finally — finally! — do something useful in Swift — what we set out to do at the very beginning of this article —namely, predicting orientation based on motion data. Here’s the code for the view controller that predicts the device’s orientation, and sets the background color of the screen to blue for vertical, and orange for non-vertical:

The view controller manages a timer than fires every 1/10th of a second; each timer tick, the neural network is used to predict the device’s orientation based on the latest data coming from Core Motion.

The full code for this view controller can be found here — ViewController.swift.

Wrap Up & Final Thoughts

While this was a bit longer than my typical post, I hope it gave an interesting look at how machine learning can be used with mobile devices. Granted, predicting device orientation isn’t the most exciting thing in the world (and let’s face it, there are much simpler ways to detect if a device is being held vertically), but I think this was at least an interesting exercise, if for no other reason than to get a chance to play around with some machine learning concepts.

A few final thoughts come to mind:

  • The configurability of a neural network library goes a long way to help tuning performance of a machine learning implementation — the R neuralnet package, while useful, was not nearly as configurable as I would have liked (e.g., tweaking a regularization parameter), and while it worked for the purposes of a demo, I would likely find something more configurable for a more serious implementation of this.
  • The accuracy of a supervised machine learning implementation is only as good as the training data fed into the algorithm. That set of 752 training observations? Yeah, I ended up collecting a few dozen sets of similar size before I finally got a training/test set that more or less correctly covered various vertical device angles to an acceptable degree — and even then, it’s not quite perfect.

And that’s it! Hopefully this was interesting — I certainly had fun with it. Let me know your thoughts, and please feel free to play with / fork the code on GitHub!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.