Apple's Vision framework to recognise vehicle number plates

Published in

Analytics Vidhya

5 min readApr 2, 2020

This is a short post on using Vision framework to detect text on images using an iOS app. I built a similar application using FirebaseML Vision framework. The downside of using a framework from Firebase is that you have to install the external FirebaseML Vision dependency, however with Apple's Vision framework you have everything ready to use with the SDK. Personally i found Apple's Vision framework results superior to FirebaseMLVision framework.

Here is a link to my Github repo where you can find the source code for this project.

I divided the post into the following:
1. Setting up project in Xcode
2. A word about the Vision framework
3. Detecting text in an image using the Vision framework

Setting up project in Xcode

Open Xcode and click on create new project, on the next screen under iOS select Single View App and click Next.

Create new Single View App project in Xcode

Choose options for your new project, you can leave the Core data, UI and Unit tests checboxes unchecked. I am using Storyboard as the user interface, you can also use SwiftUI, more on it here. Click Next and then save the project on your local computer to start working.

Your project workspace should look like this

Now, lets add some user interface elements to our view controller. I have added a 'Take photo' to load an image from Photo Library(in my case since i am using a simulator) or from a Camera if you are running the app on a real iOS device. A UIImageView to load the image and display it on the view controller and couple of labels to display the recognised text in the image. (Please note the user interface is basic and you can play with different styles to make the app look interesting. You can access the entire source code here)

A word about the Vision framework

The Vision framework applies computer vision algorithms to perform a variety of tasks on input images and video.

A list of tasks that the framework can handle are:

Face detection
Landmark detection
Text detection
Barcode recognition
Image registration
General feature tracking
Animal detection
Object detection
Text recognition (we are interested in this for our project)

Vision also allows the use of custom CoreML models for tasks such as classification or object detection.

There are basically 3 major steps to perform a Vision task:

Create a request
Use VNRequest and its derived classes to raise a request. For example to detect text in a image we use the VNRecognizeTextRequest
Create a handler to process one or more requests by calling the .perform() method on it
VNImageRequestHandler in our case
Analyse results
Use VNObesrvation and its derived classes to work with the result of the request being perfomred.
VNRecognizedTextObservation in our case

This is a good source to read about the framework

Detecting Text in an image using the Vision framework

To start working with the framework import Vision framework into your project.
Initialise and setup UIImagePickerController to pick or capture an image on which we need to run text recognition and prepare the delegate method to handle when picking image is done.

Use the UIImagePickerController delegate method to handle working with the image. If you want to use the camera of your iOS device as the image source the set the sourceTypes to .camera instead of .photoLibrary. Don't forget to set the delegate as this class and include UIImagePickerControllerDelegate and UINavigationControllerDelegate.
The didFinishPickingMediaWithInfo delegate method is where we handle capturing the image and perform text recognition Vision tasks. In this method use the UIImagePickerController's InfoKey to get the originalImage (since we have set allowsEditing to false above).

The method textRecognition handles recognising text in our image. The first step is to create a VNRecognizeTextRequest with a completion handler that will be called on successful execution of this request. Additional properties can be set to fine tune the request, recognitionLevel takes in two choices 'accurate' and 'fast', recognitionLanguages takes in a list of languages that you need the text to be detected in, usesLanguageCorrection is a boolean to indicate if you need auto correction or not, and finally you can also add a list of custom words as reference to your request.Then a request handler VNImageRequestHandler is created and .perform() is run on the request to get the results.

The results property of the request contains the observations which in our case is a VNRecognizedTextObservation. Since there can be more than one blob of text detected we need to loop in on all observations. The observations has a boundingBox property that is a CGRect with information on the size of the rectangle that bounds the detected text. To display the bounding box you need to first apply a CGAffineTransform on the boundingBox to scale it to the right format.

To recognise a number plate from the image i have used a regular expression pattern that matches number plate format in India, you can use a pattern of your choice here.

Here is the result of the code.
(Note: This is a sample app to explore how text recognition works using the Vision Framework and i have used explicit unwrapping at places and error handling is not as required, please ignore these)

If you want to try the same number plate recognition app using FirebaseMLVisoin follow this.

Written by Santosh Reddy Vuppala