ML Compute

Hacking the new Apple ML Compute framework to accelerate training of neural networks across CPU and GPU

Jacopo Mangiavacchi
5 min readJul 16, 2020

With the recent new wave of operating system versions (BigSur, iOS 14 etc.), announced at recent WWDC, Apple quite silently introduced a new ML framework to accelerate training of neural networks across the CPU or one or more available GPUs.

ML Compute is not properly a new ML framework but new API’s that utilizes the high performance BNNS primitives made available by the Accelerate framework for the CPU and Metal Performance Shaders for the GPU.

After looking at the documentation and start using it on a iOS/macOS application I understood that this is not really a simple, high level framework but something probably targeting the acceleration of existing third-party ML library like the ONNX Runtime or the TensorFlow Lite framework on the Apple platform.

Even if Apple documentation is pretty good I would say these API are not really developer friendly and swifty for doing generic ML on iOS/macOS. The tensor api for example is really rough and it requires to deal with unmanaged pointers in Swift. Basically you are responsible to manage by yourself the ownership and lifetime of memory allocation of objects like tensors, nodes and graphs that you pass to these API.

More generally ML Compute does not provide by definition ML API like Keras, PyTorch or Swift for TensorFlow to simplify building and training ML models but low level api to build computing graph and manage low level training loop.

For general ML coding on iOS/macOS I would suggest to continue to use Core ML with tools like CoreMLTools to import model from other framework (TensorFlow, PyTorch etc.) or eventually give a try to the SwiftCoreMLTools library I developed if you want to completely build and/or train model locally on devices avoiding any Python code.

Anyway my personal opinion, after playing with that, is that ML Compute could become potentially really powerful even for regular Swift ML developer adding for example on top of that a Swift Function Builder (DSL) high level API, like the one I developed for SwiftCoreMLTools, and a high level Tensor Swift API, hopefully integrated with Swift Numerics multi dimensional Array.

To quickly test the capability of these API I decided to develop and illustrate here a PoC App to train and inference with ML Compute on both iOS and macOS a simple shallow model for the MNIST dataset.

Demo ML Compute Playground

Before entering the details of this MNIST ML Compute App please let me share a quick and dirty Swift Playground I’ve initially used to get familiar with the ML Compute tensor and graph API.

As you can see here I’m not, yet, building a real ML model but I’m using a ML Compute graph to simply run some basic arithmetic operations on tensors.

I thought it could be very helpful to simply get familiar with this first.

Prepare the MNIST dataset tensors

Now that we get familiarized with the basic of the ML Compute API for building tensors and graph let start to see how to build a first sample shallow model for training the famous MNIST dataset.

We will first import, efficiently, CSV files containing the MNIST training and test dataset and see later how to transform that in ML Compute tensors.

As you can see the code below read in parallel the training and dataset CSV file, apply some normalization, and convert images and labels in Swift Float arrays.

A quick note here about concurrency and Grand Central Dispatch API (GCD). In order to quickly and concurrently processing batch of data in parallel I used here the convenient DispatchQueue.concurrentPerform API to easily manage parallel For-Loop in Swift.

Utility functions

Unfortunately reading the CSV file buffers into String and use split() method to process the rows is not really performant so I had to develop an alternative method to scan CSV file line by line using the more efficient C runtime getline() function.

Other functions below will become useful later when dealing with label and loss function cost in order to encode and decode the label using one hot encoding technique as required by the Loss function (ML Compute do not provide a sparse categorical cross entropy).

Build computational graph

Now that we have the training and test images and labels loaded on Swift Float array we can finally start building the ML Compute graph that we will use respectively to train and test the model.

The ML model we’re using in particular in this sample application is a very minimal shallow neural network with one dense layer with relu activation followed by a final dense layer with the softmax activation for the ten digit classes of the MNIST dataset.

As you can see a ML Compute graph is built just adding node of layers and activation functions, passing to all these layers and activation functions the classic parameters like shape of input/output tensors, weights/bias initial matrix etc.

It is important to pay attention here to the shape of all the tensors used for weights, biases as well as input and output of the model. In particular you can see that when building the graph we have to choose the batch size we’re gonna later use for training and inferencing, creating all these tensor shape according to this batch size.

Build training graph

Now that we have a ML Compute graph we can build a MLCTrainingGraph passing the compute graph and the training parameters.

In this case we will specify softmax cross entropy as loss function (ML Compute do not provide a sparse categorical cross entropy) and ADAM as optimizer with standard default parameters.

Training the graph

In order to train the MLCTrainingGraph we will need to create a full training loop iterating each epoch for all batch that we have, simply dividing the total number of samples we have in the training dataset by the size of batches we used above for creating tensors and graph.

In particular in the batch loop we will get a Swift unsafe buffer of the slice of image and label data from the Swift Float array and we will construct a MLCTensorData through the unsafe pointer of this slice of data and finally pass this tensor data to the training graph execute method to finally train the model with the batch data.

Testing the graph with validation data

Once the model is trained for sufficient epochs we will validate the model building a similar MLCInferenceGraph and feeding, one batch at a time, all the test data.

In the inference graph execute closure callback we will calculate finally the Accuracy of the model simply looking at how many time the prediction correspond to the test label.

The full code

As always, the code for this story is completely open sourced and available on my GitHub personal account:

Special thanks

I want to finally thank here the Apple ML Compute team for their very quick and fully-detailed help in providing me suggestions and insights on the new ML Compute framework.

--

--

Jacopo Mangiavacchi

Microsoft Principal Data Scientist — Google Machine Learning Developer Expert (ML GDE) — Former  + IBM Senior Architect and Engineer