More about Core ML and Introduction to Text Recognition in iOS.

Naresh
Engineering Jio
Published in
4 min readSep 7, 2020

This time, We will be more generic with Machine learning in iOS, Core ML, and how to achieve a few tasks like image and text recognition.

We will go step by step with the introduction of Core ML, Tasks we can do, Implementation, Advantages, Limitations, Optimising app size, etc in this article itself is a very simple way.

At WWDC 2017, Apple released a lot of exciting frameworks including CoreML.

When we talk about machine learning in iOS we need a ML model and core ML model is a unified model to accomplish our ML tasks in iOS, It does not involve network calls and uses systems resources which makes it favourable for user privacy and responsiveness of app.

Less network calls, More responsive app.

Possibilities with Core ML :

Real-Time Image Recognition, Face Detection, Text Prediction, and Speaker Identification is some of the many possibilities using Core ML.

When we say can we answer questions using ML?

Yes, BERT can be used to answer questions based on paragraphs.

There are many flexibilities provided by apple when we say ML in iOS.

We need a .mlmodel format to use in our iOS Project.

  1. We can create it using Create ML.
  2. If we have a pre-trained model in Keras, TensorFlow, Caffe model, etc then we can convert it using Core ML converters with some python script.
  3. We can create our custom model using Turi Create which was already explained before.
  4. We can use inbuilt models provided by apple also.

https://developer.apple.com/machine-learning/models/

With all these things working, It’s really great to learn about machine learning in iOS.

How to integrate Core ML model?

  • Simply drag and drop your Core ML model into your iOS Xcode project.​
  • Instantiate your ML model in the view controller and make predictions.​

Let model = NMIndianGroceryModel()​

Make Prediction :​

// It takes input image in pixel buffer format​

let output = try? model.prediction(image: buffer) {​

let objectName: String = output.label​

categoryText = objectName​

}​

When we say image it’s a collection of pixels, prefer to use images in pixel buffer format or resize an image in a favorable way suggested by our ML model.

We can see the models description in Xcode to know its input and output along with other basic details.

Insight to some Optimisations :

The two challenges to work upon are App size and system utilization.

How can we reduce our App size?​

  • To reduce or optimize our size issue we can download the model definition file (ending in .mlmodel) by using URLSession, CloudKit, or another networking toolkit​.

let compiledModelURL = try MLModel.compileModel(at: modelDescriptionURL)​

  • This creates a new, compiled model file with the same name as the model description but ends in .mlmodelc​

let model = try MLModel(contentsOf: compiledModelURL)

We can provide images to our model and our model can predict the type of image and can classify it for different objects. We have already covered image recognition in the previous article.

Let’s talk about text recognition now.

Text Recognition :

When we say text recognition it’s very important to know that how we humans read the text.

As we need to know vocabulary first to detect alphabets and characters, the same way our model needs character data set.

When we say handwriting recognition our dataset should be more advanced with more fonts.

We frame words from characters, sentences from words, and paragraphs from sentences. Similar processing is done with the help of a neural network in computer science.

Reading flow

Similar way we provide a character dataset to our ML model and provide training algorithms to classify character images, detect the character from the dataset and then complete text with the use of neural networks. It’s also called the Accurate path approach to read text.

The Fast path uses the framework’s character-detection to find individual characters, and then uses a machine learning model to recognize individual characters and words.

When we say text recognition, It will read complete text from an image but what if we need only important or particular text?

The answer is we will need our own approach or algorithm to deal with this. We can use text metrics like width, height, area to detect our favorable read.

For example: Bounding box property in VisionKit.

Note: Bounding box is also used to draw rectangles around our text while scanning our image from the camera.

Other ways of OCR and Text Recognition involve the implementation of Tesseract framework, MLKit, etc to do this task in iOS.

In 2019 and iOS 13+, Apple announced the VisionKit framework to detect text from images with both Fast, Accurate mode, and also usage of custom words to deal with non-dictionary words.

Here is a glimpse of what we achieved in our SDK​​​​​​​ using VisionKit in iOS.

Here we have more text to read but we have used the only brand name to detect packet items in our case, Similar computations are also possible according to our intent of reading and detecting our useful area in text recognition from the image.

--

--