A disassembly of machine learning and Core ML (Part 1)

Zoran Todorović
Undabot
Published in
7 min readApr 20, 2018

iOS 11 has revealed a new feature — machine learning package with Core ML and Vision frameworks. You have probably already heard of it and its cool features such as image analysis, text detection, object tracking, etc. Core ML delivers easy integration of machine learning models with your applications. Many developers know how to use given machine learning models, but many of them don’t understand what is going on under the hood of those models and their principles.

In this blog, I will try to explain some basic concepts of machine learning, give you an overview of its paradigms and try to reveal what is hiding under the hood and exactly how those machines learn. In part 2, we’ll go through the Core ML and try to connect the words from this part with corresponding Core ML features.

All the buzzwords

You have probably heard of all the buzzwords: artificial intelligence, machine learning, deep learning, neural networks… But what is the correlation between these terms?

  • Artificial intelligence (AI) is the most abstract one. It is a science that studies human brain and uses that outcome as a study basis for intelligent systems. It is not related only to computer science, but also to mathematics, psychology, philosophy and many others.
  • Machine learning (ML)is a subset of AI whose main goal is to develop predictive algorithms to solve a certain group of problems. The basic idea of machine learning is that the machine (computer) can receive a set of data and “learn” from it.
  • Deep learning (DL) is a machine learning discipline. It just stands for some types of neural network algorithms that use raw data and describe complex neural networks with many levels of abstraction.
  • Artificial neural network (ANN) is a system that mimics the biological neural networks and solves problems in some fields of artificial intelligence. It uses the structure of the human brain to develop an appropriate data analysis strategy.
  • Natural Language Processing (NLP) is also one of the AI disciplines (not only AI). Machine learning algorithms are often used to solve NLP tasks. It is the ability of a computer to “understand” and generate human language.
Buzzword fields intersection

Neural networks

Neural networks have different approach to solving problems than conventional computer algorithms. Conventional algorithms follow a set of instructions to solve a given problem. If there are no specific steps to follow, the computer cannot solve the problem. Computers would be much more useful if they could handle the problems people don’t know how to solve. This thesis motivated the emergence of artificial neural networks.

Artificial neural networks are basically a subset of algorithms built around a model of artificial neurons and its connections. Some machine learning algorithms rely on neural networks. ANNs handle information in a similar way to the human brain. The network consists of a large number of connected processor elements (neurons) that work together to solve a particular problem. Neurons are typically grouped as layers of neurons (see Artificial neural network image): input layer, hidden layer (one or more) and output layer. Input layer receives data and output layer takes out calculated network values.

ANNs solve problems with the following approach:

  1. Input data set is described by numerical values at the network input.
  2. Values are multiplied by the weight factor that describes synapse (connection) strength.
  3. Multiplied signals are summed up analogously to summing potentials in the body of the cell.
  4. If obtained amount is above the threshold defined, the neuron gives an output signal.

Instead of threshold function, a neuron may also have some other function, such as a transfer function or similar.

Artificial neural network

Neural networks solve classification, regression, prediction and all problems when there is a relationship between input and output variables.

Classification is a process of solving problem where the output variable is a category, e.g. “dry” or “wet”, “black” or “white”.

Regression is a process of estimating relationships among variables. It solves problem where we predict continuous output values, e.g. try to predict house value with historical data on house sizes and values.

Learning paradigms:

There are three main paradigms of learning artificial neural networks:

  • supervised learning
  • unsupervised learning
  • reinforcement learning

The learning method is not strongly related to the architecture of the neural network, but there are common practices of learning related to the type of network. Each learning paradigm has different training algorithms.

Supervised learning is the paradigm that sets artificial neural network parameters from training data set. The objective of learning the artificial neural network is to set the network parameter values for any valid input values and the appropriate output value. Training data consists of input and output pairs. It is called “supervised” because you need to feed the network and validate its work (training).

Unsupervised learning is where you only have input variables and no corresponding output variables. It sets the parameters of a network-based on input data and a cost function that is minimized. The cost function can take the form of any function and is determined depending on the task. The aim is to minimize the cost function.

Reinforcement learning differs from supervised learning in the display of the correct pairs of input-output. Specifically, data accuracy is not defined in reinforcement learning. In the first step, simple brute force algorithm calculates the prize value for each input data and selects the highest prize value.

We are going to concentrate on supervised learning because most of machine learning problems on mobile platforms are solved that way.

To address the problem of supervised learning, it is necessary to consider different steps:

  1. Define the type of training samples (inputs and outputs)
  2. Collect training samples that will properly describe the problem
  3. Describe the collected training data samples in the form the artificial neural network understands
  4. Train the network with collected samples
  5. Validate trained network with test data set (test pairs consists of data that didn’t take part in training process)

So far so good… But how is the network really trained?

The backpropagation algorithm

The objective of the supervised learning is to find the function that most precisely maps the input set and its output set. That was the motivation for developing the backpropagation algorithm.

Backpropagation is an algorithm that calculates the gradient of loss function considering weights of branches in the neural network. Sounds pretty disturbing, but let’s try to simplify it.

Training of the neural network consists of two steps:

  1. Calculation of network response and error for every input data
  2. Propagating error backward to previous layers in order to update the network

At the beginning of the learning, it says: calculate the output layer error. Then, for each previous layer, calculate how much each neuron affected the errors in the next layer. If you would like to export your trained neural network, you would just write connection weights as a matrix. That represents the famous word - MODEL. You have probably heard that you can import model in your iOS app.

Ok, but which machine learning algorithm and approach should you use for solving your task? Well, it depends. The hardest part of creating a solution with machine learning is data representation. You need to collect data sets for training the neural network. But, also you need to translate that data to a language that neural network understands. There are many tasks for which is hard to determine good features for machine learning algorithm. For example, we want to develop a program for detecting cars on photos. We know that cars have wheels, so we can use the presence of wheels in the photos as a feature. However, it is very difficult to precisely describe how the wheel looks in pixel values. The wheel has a simple geometric shape, but its image may have shadows falling on a wheel or glance from the metal part of the wheel, making it difficult to recognize the wheel in the concept of pixels. Many factors can affect some information, but humans are able to recognize those, machines are not. This is one of the biggest problems in applying machine learning to the real world.

If you want to build and train artificial neural network, you should do some research on neuron activation functions and neural networks architectures and types (feedforward, recurrent, convolutional…). You can use one of the existing machine learning frameworks like TensorFlow or PyTorch. If you want to have deeper understanding of machine learning, I recommend you to read Machine Learning for Humans.

Next part of this blog will cover more practical experience with machine learning on iOS, Core ML features and how to apply the model in your iOS app. Stay tuned.

Thanks to Sinisa Cvahte for the design.

Thank you for reading. Please comment, like or share it with your friends and we hope to see you soon.

Would you like to join us? Check out the open positions at our Careers page.

Undabot and Trikoder are partner organisations. We analyse, strategise, design, code and develop native mobile apps and complex web systems.

--

--