MNIST CNN Core ML Training

Jacopo Mangiavacchi
8 min readApr 26, 2020

--

Did you know you can fully train a LeNet Convolutional Neural Network model with the MNIST dataset directly on iOS devices ? And that the performance is not bad at all ?!

Training MNIST CNN on iOS devices with Core ML

In the previous article I’ve been focused on transfer learning scenarios with Core ML and in particular we saw how to create a new model on iOS device, importing Embedding weights from a previously trained model and train the rest of the layers locally, on device, using private and local data (see https://heartbeat.fritz.ai/core-ml-on-device-training-with-transfer-learning-from-swift-for-tensorflow-models-1264b444e18d)

Moving forward in my long journey towards developing a Swift Federated Learning infrastructure, this time I’ve investigated on how to train, from scratch on iOS devices, a little bit more complex model architecture, a CNN model, and on how to create this model locally directly in Swift, using the SwiftCoreMLTools library I’ve previously introduced in my stories and available on this GitHub repo:

The MNIST dataset

In this article we are covering how to implement on Core ML, directly on iOS devices and without previous training on other ML frameworks, an image classification model using the standard MNIST dataset.

The MNIST database of handwritten digits is such a well known dataset that I don’t really think it needs any introduction for the reader of this stories.

Just to very quickly recap this dataset is commonly used for introducing a specific Neural Network architecture, the Convolutional Neural Network (CNN), frequently used in image recognition and object detection domains. It provides 60,000 training and 10,000 testing black and white images, of dimension 28x28, of handwritten digits from 0 to 9.

MNIST dataset

In the sample project linked at the end of this story I’ve created an iOS/macOS SwiftUI application that prepare in pure Swift this data set, create a CNN model directly in the App using the SwiftCoreMLTools library mentioned above and train this model with Core ML feeding the local batch of prepared data.

LeNet CNN Architecture

The LeNet architecture is an excellent starting point for understanding the details and see the benefit of Convolutional Neural Networks and the combination of LeNet CNN with the MNIST dataset is such a standard in Machine Learning “training” that it is usually considered the “Hello, World” equivalent of Deep Learning for image classification.

It basically consist of two sets of Convolutional layers, with Relu activation, and Max Pooling layers, followed at the end by a fully-connected hidden Dense layer, again usually with Relu activation, and finally at the end another fully-connected Dense layer with a Softmax activation for the classification result.

LeNet CNN Network

In this story we will focus on how directly build and train a LeNet CNN model for the MNIST dataset in Swift on a iOS device and we will confront it with a classic “Python” approach based on a well known ML framework such as TensorFlow.

Preparing data in Swift for Core ML training

Before entering the details of how to create and train the LeNet CNN network in Core ML let’s see first how to prepare first the MNIST training data for batching it into the Core ML runtime.

On previous articles of this series on Core ML training stories I’ve already covered how to use Core ML MLBatchProvider and other APIs to create batch of data.

In the following Swift snippet I would simply share how the batch of training data is prepared specifically for the MNIST dataset simply normalizing the “pixel” values of each image from an original range of 0 to 255 to a more “understandable” range between 0 and 1.

Preparing the CNN Core ML model for Training

Once we have prepared and normalized the batch for our training data we can now prepare locally, in Swift, the CNN Core ML model using the SwiftCoreMLTools library.

In the following snippet you can see the architecture of the LeNet CNN model and how layers such as Convolution, MaxPooling, Flatten and the hidden and final Dense layers are sequentially called using the SwiftCoreMLTools DSL builder.

In the SwiftCoreMLTools DSL function builder code below you may see as well how to pass to the Core ML model in the same context also essential training informations, together with their hyper parameters, such as loss function, optimizer, learning rate, number of epoch, batch size and others.

Resulting CNN model

As you have noticed on the SwiftCoreMLTools DSL builder code above the Core ML model we’ve just built has a couple of Convolution plus MaxPooling nested layers and then after flattering everything it has an Hidden layers and a final Dense layer with Softmax activation for the final classification.

Below a graph representation of the generated Core ML model (from Netron):

Core ML Trainable CNN model

Compile and Train the model

Again on previous articles of this series on Core ML Training / Federated Learning stories I’ve already covered how to use Core ML API to retrain/personalize on device an existing model, eventually downloaded from the cloud or such as in this case directly generated locally on the device itself with the SwiftCoreMLTools library.

I suggest in case to look at those previous articles to see snippets of Swift code for compiling and start a training task using Core ML.

Baseline TensorFlow 2.0 model

In order to benchmark results and in particular the training performance in terms of execution time I’ve also recreated an exact replica of the same CNN model using TensorFlow 2.0.

The snippet python code below illustrates the same model architecture in TF and the summary of output shapes of each layers.

You can notice here that the layers, layer shapes and convolution filters and pooling sizes are exactly the same as in the Core ML model directly created with the SwiftCoreMLTools library.

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=in_shape))
model.add(MaxPool2D((2, 2), strides=(2,2)))
model.add(Conv2D(32, (2, 2), activation='relu', kernel_initializer='he_uniform', input_shape=in_shape))
model.add(MaxPool2D((2, 2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(500, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(n_classes, activation='softmax'))
model.summary()Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 26, 26, 32) 320 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 12, 12, 32) 4128 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 6, 6, 32) 0 _________________________________________________________________ flatten (Flatten) (None, 1152) 0 _________________________________________________________________ dense (Dense) (None, 500) 576500 _________________________________________________________________ dense_1 (Dense) (None, 10) 5010 ================================================================= Total params: 585,958 Trainable params: 585,958 Non-trainable params: 0

NB - The SwiftCoreMLTools library provide also a similar programmatic API for adding at runtime layers to a model such as in Keras but in the source code shared in this story I’m using the DSL / function builder approach. Please reference the SwiftCoreMLTools GitHub project for further documentation.

Comparing results

Before looking at the training execution time performance let first say that both the Core ML and the TensorFlow model trained for the same number of epochs (10), with same hyper parameters obtaining very similar accuracy metric on the same 10.000 test sample images.

You can see in particular from the python code snippet below that the TensorFlow model train with the same Adam optimizer and Categorical Cross Entropy loss function with a final accuracy result on the test case greater then 0.98.

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])model.fit(trainX, trainy, epochs=10, batch_size=128, verbose=1)loss, acc = model.evaluate(testX, testy, verbose=0)
print('Accuracy: %.3f' % acc)
Train on 60000 samples
Epoch 1/10
60000/60000 [==============================] - 16s 266us/sample - loss: 0.1441 - accuracy: 0.9563
...Accuracy: 0.989

For the Core ML model you can see from the iPhone App screenshot below that training and testing with the same optimizer, loss function and of course same train and test datasets it also obtain a final accuracy result greater then 0.98.

MNIST LeNet CNN training result with Core ML on iPhone App

Training performance benchmark: Core ML vs TensorFlow

For the Core ML model training, as before, I’ve execute tests on macOS and both on iOS emulator and real Apple devices noticing once again that Core ML training performance on modern iPhone/iPad devices are really much more optimized than on a Mac Book Pro with i7 CPU, Radeon GPU and lot of memory.

To provide you some real numbers about how good and promising is training on a iPhone device, using once again a completely comparable model architecture and the same training parameters, I can say I was able to train the 60'000 MNIST samples for 10 epochs in about 248 seconds on a iPhone 11 with the Core ML model and in 158 seconds instead using TensorFlow 2.0 on a i7 Mac Book Pro (using CPU only of course).

Of course there is a very huge gap between 248 seconds and 158 seconds. Basically an optimization of over 60% even without considering using GPU, but the real point here is not to configure apples with oranges but to have a glimpse of what mobile and wearable devices can do in the context of training locally, on device, very sensitive and personal data.

In particular I think it is important to reflect that training a single epoch on a mobile device with 585,958 parameters and 60,000 data points required something around 20 seconds.

Considering scenarios such as distributed training, and in particular Federated Learning, I really think these are anyway very promising numbers.

I’ll test more on my long journey towards this Federated Learning platform. Btw, if you want to contribute in any way, for example testing or implementing missing functionalities on the SwiftCoreMLTools library please be my guest.

Final touch: Core ML + SwiftUI

Final consideration here is about how easy it is to integrate Core ML training with a powerful user interface tool such as SwiftUI + Combine.

Jupyter Notebook and even other tools such as TensorFlow.js are very good for building some real real-time experimentations but I’ve to say that the opportunity that Core ML + SwiftUI offer for real on device experimentation is really amazing.

In my very simple use case for this story about training the MNIST dataset on a iPhone it was very easy for me to add a minimal touch interface to directly let user draw new digit on the screen and test it live.

The SwiftCoreMLTools library in particular, offering a Swift DSL implemented with the same Swift function builder functionality used by SwiftUI, offer a really coherent and similar approach for building the model and the UI experiment using the model on real-time scenarios.

The code

As always the code for this story is completely open sourced and available on my GitHub personal account:

Special thanks

I want to finally thank here the Apple Core ML team on GitHub and on the Apple Feedback developer tool for their very quick and fully detailed help providing me suggestions and insights on both the Core ML protobuf file format and the Core ML runtime.

--

--

Jacopo Mangiavacchi

Microsoft Principal Data Scientist — Google Machine Learning Developer Expert (ML GDE) — Former  + IBM Senior Architect and Engineer