Machine Learning and Object Detection Application — Part 1

3 min readApr 3, 2020

Nowadays, it is increasingly popular to use Machine Learning [1] for various applications [2].
In this article, we aim to explain how to implement an object recognizer with object detection techniques in a very simple way, through a study based on a project for Apple Developer Academy. The primary goal of the project is the development of a machine learning-based iOS application.

The beginning

In 2017, Apple introduced Core ML [3], a breakthrough framework to develop machine learning-based applications. In fact, we will see how to implement machine learning in our iOS app with just a few lines of code, in the “Swift-est” way possible.

From image classification to object detection

The first step in this process was to create an image classifier model [4] to recognize different kinds of fruits, to ensure that if our app receives an input image, it recognizes if the input image is a fruit and which kind of fruit it is. To make the implementation smooth and interesting, the image classifier was replaced with an object detection model [5] to recognize an object (a fruit, in our case) in live streaming mode rather than waiting for user input [6]. To accomplish this, we used two main frameworks: CoreML and Vision [7], this last one uses computer vision algorithms to carry out a task on input images and videos.

Image dataset

Originally, we created an image dataset divided in different folders, each one representing a single object (fruit), and a set of images for each folder (to ensure accuracy in the precision of object recognition; for each folder we provided at least 20–30 images seen in different angles and not greater than 1024x768 resolution in JPEG format). Then, 80% of the images were dedicated to training the computer and the other 20% to testing. It is crucial to separate training images from testing images in two different folders. This schema is used to create an image classifier, but since we chose to create an object detection, there was a need to create a JSON file with the description of every object representing the image, so it was no longer necessary to divide the object in subfolders.

File JSON structure

**JSON file for object detection model.**

So we have a JSON array containing two properties for each element: image name and annotations image, which in turn corresponds to another JSON array containing all identifiable objects inside the image and for each object are specified type (label), coordinates inside the images, width and height.