Emotion Detection with Apple technologies

How you can embed machine learning models in an iOS app

Giovanni Prisco

Published in

Apple Developer Academy | Federico II

12 min readMay 20, 2020

Introduction

Before we get our hands dirty, let’s prepare ourselves for what’s coming next.

First things first

Artificial Intelligence can be defined as an area of computer science that has an emphasis on the creation of intelligent machines that can work and react like humans.

Machine Learning can be defined as a subset of AI, in which machines can learn on their own without being explicitly programmed: they can think and perform actions based on their past experiences.
In this way, they can change their algorithm based on the data sets on which they are operating.

Machine Learning’s popularity is growing day after day and so are the possible use cases, also thanks to the huge amount of data produced by applications.

Machine Learning is used anywhere, from automating daily tasks to offering intelligent insights for basically every industry.

ML is used for prediction, image recognition, or speech recognition. It is trained to recognize cancerous tissues, frauds, or to optimize businesses.

Machine learning can be classified into 3 types of algorithms.

Supervised Learning: we give labeled data to the AI system. This means that each data is tagged with the correct label.
Unsupervised Learning: we give unlabeled, uncategorized data to the AI system and it acts on the data without any prior training, so the output is dependent upon the coded algorithms.
Reinforcement Learning: the system learns with no human intervention: given an environment, it will receive rewards for performing correct actions and penalties for the incorrect ones.

A machine learning model can be a mathematical representation of a real-world process.

To understand this, we must first know how we come to this point, for the scope of this article, we will talk more specifically about training a classification model.

Training

Training a model simply means learning good values

A neural network, at first, will try to guess the output value randomly, then, it will gradually learn from its errors and adjust its values (weights) based on these.

There are many types of classification problems:

Binary Classification: predict a binary possibility (one of two possible classes).
Multiclass Classification: allow you to generate predictions for multiple classes (predict one of more than two outcomes).

For iOS developers, Apple provides machine learning tools like Core ML, Vision, and NLP. iOS developers have different choices for accessing trained models to provide inference:

Use Core ML to access a local on-device pre-trained model.
Host a Machine Learning Model in the cloud and send data from the device to the hosted endpoint to provide predictions.
Call third-party API-Driven Machine Learning cloud managed services where the service hosts and manages a pre-defined trained model. User data is passed through an API call from the device and the service returns the predicted values.

What is Create ML?

Focused at present on vision and natural language data, developers can use Create ML with Swift to create machine learning models, models which are then trained to handle tasks such as understanding text, recognizing photos, or finding relationships between numbers.

It lets developers build machine learning models on their Macs that they can then deploy across Apple’s platforms using Swift.

Apple’s decision to commoditize its machine learning tech means developers can build natural language and image classification models much faster than the task takes if built from scratch.

It also makes it possible to create these models without the use of third-party AI training systems, such as IBM Watson or TensorFlow (though Create ML supports only very specific models).

What is Core ML?

Core ML is the machine learning framework used across Apple products (macOS, iOS, watchOS, and tvOS) for performing fast prediction or inference with easy integration of pre-trained machine learning models on the edge, which allows you to perform real-time predictions of live images or video on the device.

Advantages of ML on the edge

Low Latency and Near Real-Time Results: You don’t need to make a network API call by sending the data and then waiting for a response. This can be critical for applications such as video processing of successive frames from the on-device camera.

Availability (Offline), Privacy, and Compelling Cost as the application runs without network connection, no API calls, and the data never leaves the device. Imagine using your mobile device to identify historic tiles while in the subway, catalog private vacation photos while in airplane mode, or detect poisonous plants while in the wilderness.

Disadvantages of ML on the edge

Application Size: By adding the model to the device, you’re increasing the size of the app and some accurate models can be quite large.
System Utilization: Prediction and inference on the mobile device involves lots of computation, which increases battery drain. Older devices may struggle to provide real-time predictions.
Model Training: In most cases, the model on the device must be continually trained outside of the device with new user data. Once the model is retrained, the app will need to be updated with the new model, and depending on the size of the model, this could strain network transfer for the user. Refer back to the application size challenge listed above, and now we have a potential user experience problem.

Getting your hands dirty

As we dive deeper into the core of this article, we are assuming that you are quite familiar with an iOS Development environment and you have some basic knowledge about python.

Converting a model with python

Now let’s say that we found an interesting model on the web, unfortunately, we notice it’s not in CoreML format, but we absolutely want to use it in our iOS app and there’s no other way to obtain it, we can just try to convert it.

Apple has a specific tool to accomplish this task, a python module called coremltools that can be found at this link.

The interesting model is built with keras (tensorflow as backend) and it’s about emotion detection, you can download it from here. Let’s now convert it, first of all we’ll install the required packages. For compatibility reasons, please use python 2.7 and packages’ specified versions, as coremltools relies on these.

One final note, since we are using a deprecated version of python, we create a virtual environment to run our code.

Final final note, your path to Python 2.7 might be different, if you’re using Mac OS or Linux, check your /usr/bin/ directory. If you’re using Windows, check the path in which you decided to install python.

pip3 install virtualenvvirtualenv -p /usr/bin/python2.7 venv

Now we activate the virtual environment we just created.

source venv/bin/activate

And finally, we install our dependencies.

pip install coremltools keras==2.2.4 tensorflow==1.14.0

After this, we can start writing our script. 🚀

Create a file named converter.py, the first step will be to import coremltools.

import coremltools

Last but not least, we convert our model into a .mlmodel one.

output_labels = ['Angry', 'Disgust', 'Fear', 'Happy', 'Neutral', 'Sad', 'Surprise']ml_model = coremltools.converters.keras.convert(
    './model_v6.h5', 
    input_names=['image'], 
    output_names=['output'], 
    class_labels=output_labels, 
    image_input_names='image'
)ml_model.save('./model_v6.mlmodel')

As you can see, the first line is about output labels, this is the most important thing we need to know before converting a model, otherwise, the results will be useless for us since we will not be able to know what the output is about.

The second line is the main instruction of the script, it calls the Keras converter from the coremltools converters and converts our model based on our specifications about input and output (in this case we are specifying that we need an image as input and output_labels as output).

Finally, we save the converted model that is ready to use in our app.

Machine Learning with Apple technologies

This is what we expect our final result to be.

A quick tour of the application — A quick tour of the finished app

The first thing to do is to get a good model for our scope.

Creation of a Model via CreateML app

CreateML app is an application presented in the WWDC 2019 for Xcode 11.0 and Swift 5. This application allows everyone to create an ML Model without having a very big knowledge about training an ML model. It’s only necessary to find the information we want to use to train the model, label them (because the base of CreateML is the Supervised Training) and import everything in the application.

Now we will use the CreateML app to train our model which will recognize our emotions.

First of all, you have to find the images. I suggest to find a very rich dataset of images because the precision is very important, but the images mustn’t have a very high resolution. After that, you have to divide your images into several categories you decide, in the base of emotions you want to recognize and create a folder for each emotion. Then you have to create two super folders: train and test. The first folder is richer of images and it’s the folder where you put the images for training the model, the second folder is used to test the model just trained.

If this operation could be annoying, don’t worry: lots of datasets have a .csv file where there are the classification already done! You have only to write a simple script and solve the problem! Here is an example in Python 3.8:

import csv
import os
import sysdef main():
    input_path = “PATH_OF_YOUR_FOLDER”
    file_name = “FILE_NAME.csv”
    file = open(file_name,”r”)
    data = csv.reader(file)
    next(data) #this is used to avoid the first line
    for info in data:
        label_path = os.path.join(input_path, info[-1])
        #here you have to consider the structure of .csv filedestination_path = “mv “ + info[1] + “ “ + label_path
        os.system(destination_path)
        print(“images moved”)
    file.close()if name == ‘main’ : main()

After that, press and hold “control” button on the keyboard (if you are using macOS Catalina and Xcode 11) and click on the dock icon of Xcode. You should see a menu called “Open Developer Tool” and after, CreateML; then you click on “New Document”. You should see an interface where you can choose the right template of our MLModel: we will classify images, so we choose “Image Classifier”, that is the first template. Next, we will give the name of your project, the name of License, a short description, and where to save the project file.

Now we have to select the images for training and testing, and the only thing to do is to drag the “train” folder in the “Train” section and “test” folder in the “Test” section. Validation must be set in “Auto”. Then we have to choose a maximum of iterations (600 should be good) and then press “Start” at the top of the interface. The other commands in the bottom part are only to edit images for being more useful in the training process, but in this situation, we haven’t needed it.

After a long time, we have our characteristics about our model and, in the top right, our model. You have only to drag this model out of CreateML windows and drop it in an external folder or Desktop.

Using CoreML with Vision

For creating our software we need two frameworks and an MLModel based on image classification (created before or converted from other models): these frameworks are Vision and CoreML.

We already talked about CoreML, but what’s Vision about?

Vision permits us to manipulate images and videos using Computer Vision Algorithms for lots of operations like face and face landmark detection, text detection, barcode recognition, image registration, and general feature tracking. We will use the CoreML interfacing for classifying images.

To start with this tutorial, first of all, clone this repository: in it, there is a simple application (written in Swift 5.2 and compatible for iOS 13.2) with a simple ViewController where there is an UIImageView and a Label. The first is used to show the images we choose to verify emotions, the second is used to identify the emotion and indicate the precision of our image classification.
There is also a Greyscale converted because lots of datasets give us grayscale images and, for this reason, image classification is more correct.

Now let’s start:

Create a new file called “PredictionManager.swift” where we can implement our classification function.
Save it in the folder of your app.
Import UIKit, CoreML and Vision in your project
Add your model to the project. To do this just drag and drop the .mlmodel file in the project folder opened in Xcode navigator, then select “Copy as Group”.

Now let’s start to write code! 🥳

First, we create a class, called PredictionManager, with two variables:

var emotionModel: MLModel
var visionModel: VNCoreMLModel

The first variable is the MLModel we consider for our project, instead, the second variable is a Vision Container where we put our MLModel (trained for images) and make operations on it (called “VNCoreMLRequest”)

After declaration, let’s create the constructor:

init() {
    self.emotionModel = EmotionClassificator().model
    do{
        self.visionModel = try VNCoreMLModel(for: self.emotionModel)
    }catch{
        fatalError(“Unable to create Vision Model…”)
    }
}

Firstly we assigned to the MLModel variable our model (in this case called “EmotionClassificator”, but in general the name of this class is equal to the .mlmodel file name) because every .mlmodel file creates a class called like the model, and this class is usable for every operation with CoreML, but to access to its implementation, you have to open the .mlmodel file and click on the arrow on the right of the Model name.

Then we assign to visionModel the MLModel if this model is compatible with Vision.

Now we can start with our function:

func classification(for image: UIImage, complete: @escaping (String) -> Void)

For classifying our image we have to use an image (UIImage) and we will have to output a String (here we can have a string with @escaping closure, that prevents us to delete information, here in type “String” when function variables are de-allocated).

Now, the first thing we have to do is the VNCoreMLRequest, to create the request to our MLModel:

func classification(for image: UIImage, complete: @escaping (String) -> Void){
    let request = VNCoreMLRequest(model: self.visionModel {(request,error) inguard error == nil else {complete(“Error”); 
        return
    }
        guard let results = request.results as [VNClassificationObservation], let firstResult = results.firstelse { 
            complete(“No Results”); 
            return
        }
        complete(String(format: “%@ %.1f%%”, firstResult.identifier, firstResult.confidence * 100)) 
    }
}

Our VNCoreMLRequest needs the VNCoreMLModel to operate the requests and then we consider three situations:

the model isn’t useful for our purpose;
the entire request (where the results are represented like VNClassificationObservation) doesn’t give any result;
choosing the first result (the most precise), we will print our information (the classification and the confidence).

To have more precision, we will crop the images to the center:

request.imageCropAndScaleOption = .centerCrop

Now we need a handler to handle every request to the VNCoreMLModel, but first, we have to give it an image filtered and optimized for our process: for this reason, we create a CIImage (Core Image) and we will give it a prefixed orientation with CGImagePropertyOprientation:

guard let ciImage = CIImage(image: image) else { complete(“error creating image”); return}let orientation = CGImagePropertyOrientation(rawValue: UInt32(image.imageOrientation.rawValue))

And now it’s time to build the request:

DispatchQueue.global(qos: .userInitiated).async {
    let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation!)do {
        try handler.perform([request])
    } catch {
        complete(“Failed to perform classification.”)
    }}

To help our application, we will permit the handler to go in the global area and be activated only on the request of the user (when the user chooses the image). This operation will be asynchronous, so it will be executed independently from the rest of the app.

In the end, we will build our handler (using the CIImage created before and the orientation) and try to perform requests created before.

Now we complete our function of classification. Let’s go to call it in the ViewController.

In the Extension of our ViewController, after the dismiss, let’s write this:

let monoImage = image.mono

Here we convert our image in mono-image, after we will

predictionManager.classification(for: monoImage) { (result) inDispatchQueue.main.async { [weak self] in
        self?.predictionLabel.text = result
    }}

After the classification, the result will be processed in the main thread (DispatchQueue.main.async), and, with a weak self we will give the result of our classification.

Now you can classify emotions! 🤩 What are you waiting for? Try in on your iPhone!

For the complete project, check out our repository: