Bring machine learning into your iOS apps (Part 1)

Published in

by MWM

8 min readJul 26, 2021

A journey from PyTorch to CoreML

All fields are impacted : from Spotify’s Discover Weekly to Snapchat filters through Facebook’s photo auto-tagging, those kind of “wow” features are ubiquitous nowadays and greatly enhance user experience.

Machine learning is the name of the research field that gathers all those technologies running under the hood. It is based on automatic data processing and decision-making and can be cleverly applied to any data modalities like text, image or audio. For the last years a lot of research effort has been deployed to make models smaller and faster while more accurate and robust. In the meantime on the production side, several frameworks have emerged to facilitate mobile ML development. We’ll focus in today’s article on CoreML.

CoreML is a machine learning framework introduced by Apple in 2017. It is optimized for on-device performance of a broad variety of model types by leveraging Apple hardware and minimizing memory footprint and power consumption. Models run strictly on the user’s device and remove any need for a network connection, keeping the app responsive and users’ data private.

No more hassle with back-end server setup
No need for heavy third-party libraries
No worries for user data privacy.

Plus, CoreML has undergone multiple major updates already : now cutting-edge neural networks can be applied to rich media like sound or video in no time thanks to Neural Engine support. Each new generation of Apple devices brings on a drastic speed-up for machine learning pipeline inference.

Considerable speed-up thanks to Neural Engine chips
Latest model designs supported
Ease of integration into Swift code and Vision framework

A large number of apps already make use of CoreML technology like Siri, QuickType or Camera. Apple has made available some models that resolve tasks like objet classification, person detection etc. However… how to build a custom CoreML model in the first place ? Apple has developed CreateML to let developers train custom models as easily as some drag-and-drops. Let’s be fair, the vast majority of ML pipelines require some more complicated code. Two main libraries allow for training machine learning models, TensorFlow and PyTorch, and there’s a good chance you already use one of them : we’ll focus on PyTorch. Once the model is ready to get aboard the production phase, there’s one seemingly anecdotal step : its conversion into a CoreML model. Let’s make ours with the open-source library provided by Apple : coremltools 🤖

What exactly is a CoreML model ?

Shedding light on the black-box

Before diving into how to get a CoreML model from your favorite PyTorch model, let’s get to know what it exactly consists in.

A CoreML model is simply a file in .mlmodel format, an open standard format based on protobuf (protocol buffers) which is a method of serializing and structuring data pretty much like JSON or XML (but faster). It falls into three parts :

a specification version : boils down to an integer
a model description : information about its inputs, outputs and additionnal metadata
a model type : a neural network, a tree ensemble regressor, a support vector machine ?

PyTorch to CoreML in a nutshell

What happens in theory…

There are two ways to convert a PyTorch model to a CoreML model : either direct conversion, or through a little detour into ONNX land.

TorchScript to CoreML (v4)

We’ll first start by the most straightforward one, made possible thanks to the latest update to CoreML 4. In theory, a few lines of code are all you need. The recipe is in essence pretty simple :

Get a scripted version of your PyTorch model
Use coremltools to convert the TorchScript model into a CoreML one

coremltools — the black sheep of CoreML ?

What happens in practice…

However great CoreML appears to be, lots of mobile developers hit a brick wall on the conversion task, often because we don’t dig enough into the convoluted documentations (no pun intended), sometimes because errors raised just look confusing. Here’s a step-by-step non-exhaustive guide on how to finally get your .mlmodel safe and sound.

First, put your model in base configuration : tensor types as inputs, no flexible shapes, no classifier parameterization. You’ll focus on processing steps afterwards.
Try to stick with latest version of CoreML 4 and latest supported version of PyTorch.
Sometimes it is worth a shot downgrading torch and see if the error remains (like type_as not implemented error which disappears when downgrading from torch 1.8).
⚠️ In some weird cases, prefer the CoreML 3 option. After converting to ONNX, use onnx-simplifier before converting to CoreML. It may solve your problem in cases where the graph visualization in Netron looks way overcomplicated for a few operations.

You stumble upon a non supported/implemented operation. What to do ?

New operations or layers are regularly added to the PyTorch framework but remain unsupported in CoreML. To cope with this pace gap, coremltools allow you to create composite or custom operators every time you encounter an unsupported operation.

First, do not forget to check the model specification details to see if the operations really does not exist.

apple/coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation. - apple/coremltools

github.com

Once you made sure the operation cannot handled by CoreML, two options are available to you when converting a CoreML model : either you re-write the operation in PyTorch before the conversion, or you write it from scratch in Swift after the conversion.

Composite operators : use the register_torch_op decorator

In many cases the operation can be decomposed into smaller bricks that are already registered in CoreML. CoreML 4 provides with MIL (Model Intermediate Language) programming. It is the bridge in the Unified Conversion API between models written in PyTorch or TensorFlow and the CoreML protobuf representation.

CoreML does not register all new fancy activations out there in the Deep Learning field, so it is on you to add them to the records. Exemple here for the silu activation:

Custom operators

Sometimes the operation cannot be decomposed, thus cannot be represented as a composite operator. The issue is shifted on the iOS side by implementing the Swift class that define the operator’s computational behaviour. To make the CoreML model understand how to use its custom layer, there is still some work on the PyTorch side : you’ll need to register the operation and provide details about its inputs, outputs and parameters.

See https://machinethink.net/blog/coreml-custom-layers to get a deeper dive into the implementation of custom layers on the Swift side.

💡 If there is any doubt, vizualize your model with Netron after conversion with custom_layers enabled. It bypasses non supported operations so you know if your error is isolated.

Some random tips that might help:

Do not toy much with “original” shapes, stick with BxCxN(HxW) as much as possible. For example, Error computing NN outputs means your model has likely some shape issue at some node
Beware implicit output casting when you re-paste parts of model (like an integer tensor changing automatically in float). Sometimes you need to manually change metadata to make it work in XCode (specification version manual downgrade at many occasions)
Beware spec modifications are always done in-place, so save and visualize regularly to prevent reasoning based on already false assumptions
Remove dynamic slicing or indexing as much as possible

Four practical conversion use cases 👀

Pytorch’s Grid Sampler operation is now quite famous as its CoreML conversion is not straightforward …

→ Here is an approximation of the grid sampler using MIL programming language.

One can also use Metal and implement a custom version of this particular operation in order to leverage the GPU of the device (but not the ANE, though)

2. Loop over flexible length input in the example of autoregressive decoding

Control flows and flexible input shapes are not straightly supported in the PyTorch to CoreML conversion framework due to the dynamic graph nature of PyTorch models.

Combining tracing and scripting operations of the JIT library (PyTorch to TorchScript convertor) allows to include control flows in the TorchScript graph definition of the model that then get easily converted to CoreML using coremltools.

Flexible input shapes is then allowed during the conversion to CoreML using the RangeDim operator of the coremltools library

3. Depthwise cross-correlation layer

For specific custom layers, the problem can be avoided by decomposing the operation as a sequence of simple ones that are registered in the coremltoolslibrary.

For instance, SiamRPN++ [1], a deep neural network architecture for object tracking, includes in its forward pass a “depthwise cross correlation” between two feature-maps coming from its two branches. The weights of these two tensors are dynamic, as they depend on the two inputs of the network: one is the target image that represents the object that we are looking for, and the other one is the search region, a wider image in which we are searching the object. The depthwise cross correlation is carried out by performing, for each of the tensor channels, the convolution between the search feature map and the target feature map. The latter plays in fact the role of a 2D convolution kernel for each channel.

For a trained SiamRPN++ model, the dimensions are fixed and thus can be made explicit. In our use case, let’s say that the tensors of the search region and the target kernel are respectively of size `(1, 256, 24, 24)` and (1, 256, 4, 4). With these assumptions, we can decompose the operation as follows:

Unfold the search region tensor (C, H, W) into columns (C, nb_elements_in_kernel, nb_kernel_sliding_blocks). Each column corresponds to a sliding block of the kernel.
Flatten the kernel tensor and perform the tensor matrix multiplication.
Resize and rearrange the output tensor to the format that would have been obtained if we had performed a convolution the traditional way.

Note that even with this decomposition, the unfold part isn’t registered as it is in the CoreML conversion library. However it can be expressed as a composition of two valid sliding_windows operations along the H and W axis, with some rearrangement of the dimensions.

By doing so, a trained SiamRPN++ is convertible into an interpretable .mlmodel format, even if the model involves a non-classical operation in its graph! 🙌🏼

4. The inverse layer

The operation inverse is not implemented. If it happens to be a 2x2 matrix inversion, you can easily get away with your 1st grade maths. I’ll let you create a MIL program and register the operation 🙂

Conclusion

Though the CoreML ecosystem is constantly gaining in maturity, converting custom models architecture from Pytorch can still be a hassle for ML researchers and developers. an overview of the Apple ML framework and shared some tips and tricks for particular use cases.

CoreML is a strong toolbox that enable deep learning models to be accelerated on iOS device through dedicated chips like the Apple Neural Engine. In the second part of this article, we will cover how to exploit fully the possibility of CoreML + ANE on modern iOS devices.

Interested in mobile apps and music + creativity ? Click here to see open positions at MWM, the global leading music app publisher 🚀

References

[1] Li, Bo, et al. “Siamrpn++: Evolution of siamese visual tracking with very deep networks.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.