Building a Neural Style Transfer app on iOS with PyTorch and CoreML
In June 2017 Apple introduced CoreML, a framework designed to integrate Machine Learning models into iOS apps.
This opened a great deal of possibilities for developers, from image analysis to NLP (Natural Language Processing), decision tree learning and more.
In this article, I will explain the basic blocks required to create this kind of app, and hopefully pass a few of the things I learned along the way.
Training an NST model
In a nutshell, ML models are similar to functions : they take one or multiple inputs to returns one or multiple outputs. One big difference though is that as a developer you don’t write a model, you train it. The trained model can then be fed inputs which will be processed through its underlying tensors to reach a result, or prediction.
Our knowledge is therefore shifted to a higher level task : implementing the training algorithm.
Multiple frameworks are available to achieve this : Tensorflow, Keras, Caffe, PyTorch… We will focus on that last one in this article, PyTorch, as it provides a strong GPU acceleration, which will reveal as an important feature in the next steps.
To get started, I recommend using Anaconda which makes it very easy to manage your environment in case you have several conflicting dependencies on your workstation. If you have an NVidia Graphic Card, it’s very important that you also install the proper CUDA toolkit for it to enable GPU acceleration during training.
Once your environment is set, you can go to PyTorch Github repo which list multiple usage examples, one being a Fast Neural Style sample. Clone this repo onto your workstation and activate your environment.
You should now be able to try training your first model using the default parameters. This will start a training loop which will save a checkpoint every two thousands iterations.
Those checkpoint are nothing more than a model at a given number of iterations. Usually the more iteration, the better, but in our case we are aiming for beauty, which is subjective and difficult to measure without actually having a look at the result.
You can look at how neural_style.py implements the actual training and what makes it “improve” over the previous iteration, which uses multiple metrics like style-weight, content-weight, regression loss and more.
Trying with other style image, you‘ll find that tweaking those style-weight and content-weight is a good place to start in order to improve the model predictions.
Exporting your model to CoreML
Here comes one of the tricky parts. PyTorch doesn’t natively support exports to CoreML (yet). Fortunately, you might have heard about an endeavour aiming at make Machine Learning models interoperable, the Open Neural Network Exchange.
ONNX provides tools for importing and exporting models from almost any framework to the .onnx format (mandatory xkcd).
Hence, we can export our model to CoreML in 2 phases :
PyTorch → ONNX→ CoreML
neural_style.py already have an ONNX export, so we really just need to implement the second step. We can find everything we need on the onnx-coreml Github repo to bridge that gap.
Now that we have our ONNX to CoreML converter, we can convert a checkpoint with a simple bash command.
Using a CoreML NST model on iOS
You should feel more at home for this part if, like me, you are an iOS developer.
First, we start by creating a simple project with an UIImageView to hold our original and output image, as well as an UIButton to launch the process.
Next, we add an UIBarButtonItem to pick an image from the device’s photo library and set it onto the UIImageView.
Alternatively, we can just provide a default image and use that one.
Once the UI is implemented, we can import our model into the app (download this pre-trained model if you don’t have one). This is done by simply drag-and-dropping the file on Xcode’s files view.
By clicking on the model, you can see some important details about it.
- Specifications (Name, type, Size, Author, Description and License)
- Evaluation parameters (Inputs and Outputs with their respective names as well as expected types).
You can also see a Model Class which was auto-generated by Xcode. You can click on the little arrow next to the Class name to jump in the actual file and see the API that will enable us to use our NST model. The function that we will call here is prediction(inputImage: CVPixelBuffer).
Yep, it might be counter-intuitive but in CoreML the image type is CVPixelBuffer, not UIImage. Fortunately, Matthijs Hollemans has an awesome library to help us make that conversion easily (and many more things).
So here are the steps once we have picked the original image:
- Save the input image size
- Convert the input UIImage to a 720 x 720 CVPixelBuffer, as specified in the model view.
- Feed it to our Model
- Convert the CVPixelBuffer output to an UIImage
- Resize it to the original size saved at the first step
Here is a Swift 4 excerpt of the implementation of those steps.
We’re done! You will see that it doesn’t hurt to do all this on a background thread as it is pretty CPU intensive. It’s also a real memory hogger, so you might have to make some optimisations to get your code working on an actual device!
If you wish to get your hands dirty, you can find the sources for this project on Github.
I hope you have learned a few things about CoreML or even Machine Learning in this article. If you want to go further, have a look at how I managed to reduce my model size by 4x using quantization for iOS12.