In my previous article， I introduced Core ML which is a general machine learning framework. Apple also provide frameworks for specific areas. In this article, I’ll dive into Vision framework on computer vision. This framework is based on Core ML.
Vision gives us several tools to analyze image or video to detect and recognize face, detect barcode, detect text, detect and track object, etc. I will explain each tool with example. The example has been uploaded to GitHub — NilStack/HelloVision.
Simply speaking, there are three roles in Vision’s usage. They are request, request handler and observation. There are different image analysis request types to use different tools in Vision. For example we define VNDetectFaceRectanglesRequest to detect face in an image. As request Handler, there are only two kinds of request handlers: VNImageRequestHandler and VNSequenceRequestHandler. One is for single image and the other is for “a sequence of multiple images”. The results are wrapped in “observations”. The informations in observation like bounding box of analysis result.
A simple template to use Vision is like the following code block.
In every feature, I only show parts of codes. Please refer to the complete project on GitHub.
1. Machine Learning Image Analysis
This is to analyze image with Core ML model. The corresponding request is VNCoreMLRequest. I will use a new model MobileNets by Google. It is “for mobile and embedded vision applications”. You can download the model file which has been converted to Core ML format by Matthijs Hollemans from awesome-CoreML-models.
Here is the result
2. Face detection
Face detection is to help find faces in an image. The corresponding request is VNDetectFaceRectanglesRequest. The bounding boxes for detected faces are wrapped in the result VNFaceObservations. In the example, rectangles are drawn around the faces.
The handleFaces is the completion handler.
The result is
3. Face Landmarks Detection
Face Landmarks Detection is to help find different facial features in the image. The corresponding request is VNDetectFaceLandmarksRequest. The regions for different landmarks are wrapped in the results. In a region, the points will mark the landmarks like eyes, nose, mouth, etc.
The result is below.
4. Text Detection
Text detection is for detecting text area in image. The request is VNDetectTextRectanglesRequest.
The result is
5. Barcodes Detection
Barcodes Detection is to detect barcodes in image. But I always get nil with VNDetectBarcodesRequest and can’t find document or sample as reference. Please help me if you get right result with barcodes detection.
6. Object Tracking
I use VNImageRequestHandler in previous requests. But in object tracking, I need to handle video, so it’s time to change to VNSequenceRequestHandler which is for “a sequence of multiple images”.
Let’s see the result.
Get complete project for all examples from GitHub — NilStack/HelloVision.
Next article about machine learning in iOS 11 is
At last, I will list official resources from Apple’s official document and WWDC session.
I will keep updating this article and example. Thanks for your time. Please click the ❤ button to get this article seen by more people. Talk to Peng by Twitter: nilstack | GitHub: nilstack | LinkedIn: Peng | Email: firstname.lastname@example.org
Note: Swift World is a new publication by me to collect excellent articles, tutorials and codes on Swift. Please follow it if interested.