Vision Framework for iOS: The Basics

Published in

Academy@EldoradoCPS

5 min readJun 9, 2020

Let’s talk about computer vision for iOS?

This post is a continuation of two other previous posts that i give an introduction to image processing and computer vision for iOS and explain how to use image filters with Core Image. Now, the purpose of this article is to present Vision, Apple’s framework for developing computer vision iOS applications.

Vision Framework offers a number of image analysis and computer vision capabilities. With it, you can perform:

Detection of face to face markers;
Text detection;
Barcode recognition;
Image registration;
Feature tracking;
Classification and detection of objects using CoreML models.

In Vision, there are three roles:

Request — used when you ask the framework to detect something and it describes your request for analysis. There are different types of request:

VNDetectFaceRectanglesRequest — to detect faces in an image.
VNDetectBarcodesRequest — for barcode detection.
VNDetectTextRectanglesRequest — for visible text region detection in an image.
VNCoreMLRequest — for image analysis that use Core ML models to process images.
VNClassifyImageRequest — request to classify an image.
VNDetectFaceLandmarksRequest — request to find facial features in an image, such as eyes and mouth.
VNTrackObjectRequest — request that tracks the movement of an object in several images or video.

Request Handler — will execute your request, being used when you want the framework to do something, processing one or more request analyzes. There are two types:

VNImageRequestHandler — to analyze an image.
VNSequenceRequestHandler — to analyze a sequence of images, used mainly for object tracking.

Observations — the results of requests requested to Vision are wrapped in observations. The information in the observations is a result bounding box that can be of types:

VNClassificationObservation — classification information resulting from image analysis with a Core ML model
VNFaceObservation — for face detection.
VNDetectedObjectObservation — for object detection.
VNCoreMLFeatureValueObservation — a collection of key-value information resulting from prediction of image analysis with a Core ML model.
VNHorizonObservation — determine the horizon angle in an image.
VNImageAlignmentObservation — detect transforms needed to align the content of two images.
VNPixelBufferObservation — an output image resulting from image-to-image processing analysis with a Core ML model.

I will show the use of Vision Framework through two examples! Let’s code!

# 1 — Sorting images using Core ML models

Core ML offers several pre-trained models that can be used for classification, but it is also possible to train other models using Apple’s Create ML framework. You can learn more about Core ML and Create ML with Apple's Documentation.

In this case, we will use a pre-trained model. What happens is that we have a static image as input that is processed and analyzed according to the specified model and, as an output, we have a vector classification information with data such as Identifier (image classification) and Confidence (percentage of classification correctness) for each identifier.

The first thing we need to do is import CoreML and Vision.

import CoreML
import Vision

In our example, we will use the MobileNetV2.mlmodel provided by Apple:

Download the template at https://developer.apple.com/machine-learning/models/
Click and drag the model into the project

As stated earlier, the handler will need to access an input image to allow classification. The Vision Request Handler works with CIImage-type images. An instance of CIImage is an immutable object that represents an image and is an object that does not directly represent the image’s bitmap data.

To work with a Core ML model, we need to instantiate it in order to access it.

guard let model = try? VNCoreMLModel(for: MobileNetV2().model) else{     fatalError("Erro acessando modelo")}

We now need to instantiate the request and, to perform image classification using the Core ML model above, we must treat the result as VNClassificationObservation.

let request = VNCoreMLRequest(model: model) { (request, error) in     guard let results = request.results as?                       
                                       [VNClassificationObservation] 
     else{
          fatalError()
     }
}

As stated earlier, we need to create a request and send it to the handler by perform(:

let handler = VNImageRequestHandler(ciImage: image)do{
    try handler.perform([request])
}catch{
    print(error)
}

In summary, assuming we have a detect method responsible for analysis and classification, it should look like this:

func detect(image: CIImage){    guard let model = try? VNCoreMLModel(for: MobileNetV2().model)     
    else{
        fatalError("Erro acessando modelo")
    }    let request = VNCoreMLRequest(model: model) {(request, error) in
   
        guard let results = request.results as? 
                                      [VNClassificationObservation]
        else{
            fatalError()
        }
     }     let handler = VNImageRequestHandler(ciImage: image)     do{
        try handler.perform([request])
     }catch{
        print(error)
     }
}

You may now create a label to show the result of the classification. The observation would be an array of information and the first element should be the best match (confidence) 😉.

Pretty easy right? Now let’s see another example!

# 2 — Detecting Faces within an image

As the example above, we will implement the detect method but, in this case, the function is to detect faces within an image. Remember that Vision works with CIImage.

First, we need to instantiate the request as NDetectFaceRectanglesRequest and treat the results as VNFaceObservation. In this case, if you are using the Xcode simulator, it will return the number of faces in the image that is the number of elements in the observation array. If you are using your mobile device, it will highlight the faces drawing a yellow rectangle around it.

let request = VNDetectFaceRectanglesRequest(){(request, error) in    guard let observation = request.results as? [VNFaceObservation]                      
    
    else{
       return
    }
}

As the example #1 above, the handler of type VNImageRequestHandler should be performed and the method detect should be like this:

func detect(image: CIImage){   let request = VNDetectFaceRectanglesRequest(){(request, error) in     guard let observation = request.results as? [VNFaceObservation]    
   
     else{
        return
     }
   }   let handler = VNImageRequestHandler(ciImage: image)   do{
        try handler.perform([request])
   }catch{
        print(error)
   }
}

That's it! Now you can start using Vision Framework in your iOS applications!

If you want to learn more about Text Recognition, read this article about Natural Language Processing in iOS.

Enjoy! 😘

Vision Framework for iOS: The Basics

Written by Ilana Concilio