Swift World: What’s new in iOS 11 — Vision

Peng
Peng
Jun 12, 2017 · 4 min read
Image for post
Image for post

In my previous article, I introduced Core ML which is a general machine learning framework. Apple also provide frameworks for specific areas. In this article, I’ll dive into Vision framework on computer vision. This framework is based on Core ML.

Image for post
Image for post
from Core ML Document

Vision gives us several tools to analyze image or video to detect and recognize face, detect barcode, detect text, detect and track object, etc. I will explain each tool with example. The example has been uploaded to GitHub — NilStack/HelloVision.

Simply speaking, there are three roles in Vision’s usage. They are request, request handler and observation. There are different image analysis request types to use different tools in Vision. For example we define VNDetectFaceRectanglesRequest to detect face in an image. As request Handler, there are only two kinds of request handlers: VNImageRequestHandler and VNSequenceRequestHandler. One is for single image and the other is for “a sequence of multiple images”. The results are wrapped in “observations”. The informations in observation like bounding box of analysis result.

A simple template to use Vision is like the following code block.

In every feature, I only show parts of codes. Please refer to the complete project on GitHub.

1. Machine Learning Image Analysis

This is to analyze image with Core ML model. The corresponding request is VNCoreMLRequest. I will use a new model MobileNets by Google. It is “for mobile and embedded vision applications”. You can download the model file which has been converted to Core ML format by Matthijs Hollemans from awesome-CoreML-models.

Here is the result

Image for post
Image for post

2. Face detection

Face detection is to help find faces in an image. The corresponding request is VNDetectFaceRectanglesRequest. The bounding boxes for detected faces are wrapped in the result VNFaceObservations. In the example, rectangles are drawn around the faces.

The handleFaces is the completion handler.

The result is

Image for post
Image for post
Photo in this example is by Smart Photography Learning

3. Face Landmarks Detection

Face Landmarks Detection is to help find different facial features in the image. The corresponding request is VNDetectFaceLandmarksRequest. The regions for different landmarks are wrapped in the results. In a region, the points will mark the landmarks like eyes, nose, mouth, etc.

The result is below.

Image for post
Image for post

4. Text Detection

Text detection is for detecting text area in image. The request is VNDetectTextRectanglesRequest.

The result is

Image for post
Image for post

5. Barcodes Detection

Barcodes Detection is to detect barcodes in image. But I always get nil with VNDetectBarcodesRequest and can’t find document or sample as reference. Please help me if you get right result with barcodes detection.

6. Object Tracking

I use VNImageRequestHandler in previous requests. But in object tracking, I need to handle video, so it’s time to change to VNSequenceRequestHandler which is for “a sequence of multiple images”.

This example is from jeffreybergier’s blog Getting Started with Vision on iOS 11.

Let’s see the result.

Get complete project for all examples from GitHub — NilStack/HelloVision.

Next article about machine learning in iOS 11 is

Swift World: What’s new in iOS 11 — Natural Language Processing

At last, I will list official resources from Apple’s official document and WWDC session.

Vision Document

WWDC 2017 Session Vision Framework: Building on Core ML

I will keep updating this article and example. Thanks for your time. Please click the ❤ button to get this article seen by more people. Talk to Peng by Twitter: nilstack | GitHub: nilstack | LinkedIn: Peng | Email: guoleii@gmail.com

Note: Swift World is a new publication by me to collect excellent articles, tutorials and codes on Swift. Please follow it if interested.

SwiftWorld

Everything you need to know about living in Swift world.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store