Swift World: What’s new in iOS 11 — Vision

In my previous article, I introduced Core ML which is a general machine learning framework. Apple also provide frameworks for specific areas. In this article, I’ll dive into Vision framework on computer vision. This framework is based on Core ML.

from Core ML Document

Vision gives us several tools to analyze image or video to detect and recognize face, detect barcode, detect text, detect and track object, etc. I will explain each tool with example. The example has been uploaded to GitHub — NilStack/HelloVision.

Simply speaking, there are three roles in Vision’s usage. They are request, request handler and observation. There are different image analysis request types to use different tools in Vision. For example we define VNDetectFaceRectanglesRequest to detect face in an image. As request Handler, there are only two kinds of request handlers: VNImageRequestHandler and VNSequenceRequestHandler. One is for single image and the other is for “a sequence of multiple images”. The results are wrapped in “observations”. The informations in observation like bounding box of analysis result.

A simple template to use Vision is like the following code block.

In every feature, I only show parts of codes. Please refer to the complete project on GitHub.

1. Machine Learning Image Analysis

This is to analyze image with Core ML model. The corresponding request is VNCoreMLRequest. I will use a new model MobileNets by Google. It is “for mobile and embedded vision applications”. You can download the model file which has been converted to Core ML format by Matthijs Hollemans from awesome-CoreML-models.

Here is the result

2. Face detection

Face detection is to help find faces in an image. The corresponding request is VNDetectFaceRectanglesRequest. The bounding boxes for detected faces are wrapped in the result VNFaceObservations. In the example, rectangles are drawn around the faces.

The handleFaces is the completion handler.

The result is

3. Face Landmarks Detection

Face Landmarks Detection is to help find different facial features in the image. The corresponding request is VNDetectFaceLandmarksRequest. The regions for different landmarks are wrapped in the results. In a region, the points will mark the landmarks like eyes, nose, mouth, etc.

The result is below.

4. Text Detection

Text detection is for detecting text area in image. The request is VNDetectTextRectanglesRequest.

The result is

5. Barcodes Detection

Barcodes Detection is to detect barcodes in image. But I always get nil with VNDetectBarcodesRequest and can’t find document or sample as reference. Please help me if you get right result with barcodes detection.

6. Object Tracking

I use VNImageRequestHandler in previous requests. But in object tracking, I need to handle video, so it’s time to change to VNSequenceRequestHandler which is for “a sequence of multiple images”.

This example is from jeffreybergier’s blog Getting Started with Vision on iOS 11.

Let’s see the result.

Get complete project for all examples from GitHub — NilStack/HelloVision.

Next article about machine learning in iOS 11 is

Swift World: What’s new in iOS 11 — Natural Language Processing

At last, I will list official resources from Apple’s official document and WWDC session.

Vision Document

WWDC 2017 Session Vision Framework: Building on Core ML

I will keep updating this article and example. Thanks for your time. Please click the ❤ button to get this article seen by more people.