Creating a License Plate Reading iOS Application Using OCR Technologies and CoreData
Full tutorial using different libraries — TesseractOCRiOS, SwiftOCR, and Google MLVision | Can we beat Google?
It’s interesting to see how far we’ve come when it comes to character recognition technologies. Reading and identifying text inside a clean image of a scripted paper has become easy, but what about real-life scenarios where the lighting is bad and the image is crooked.
Well, it’s still surprisingly hard to read and recognize characters and that’s because Vision, in general, is complicated and expensive in terms of computation. There are also so many parameters that go into the pre-processing of an image before even trying to grasp any useful information from a given image.
When you try to process an image from a phone it becomes harder due to the fact you can’t leverage any GPU acceleration to speed up the process.
Let’s dive into the world of vision for iOS devices, for the purpose of this tutorial, we will be reading license plates and identifying the city it’s registered in. Our application will be called “Where Are You From?”
Create the App Skeleton
Basic setup
Let’s start by creating an iOS project with a single view app, make sure to check “Use Core Data”:
Create View Controllers
We need four View Controllers:
- ViewController:
This is where we will set our CollectionView with all the license plates (Moroccan license plate in this case) and with the corresponding city.
- OcrViewController:
In this controller, we will add two buttons: one for accessing the photo library and the other one for accessing the camera.
- WhatCityViewController:
Here we will display the actual image with the corresponding city.
- MainTabViewController:
This is our main navigation where we will create full navigation for our application.
Create a navigation
Navigation in iOS is pretty straight-forward and easy to implement, I’ve changed some things like the font for the navigation bar, etc. so it does look nice. Here’s the full source code for the navigation, you can still find all this material in the Github repository:
Make sure to change AppDelegate.swift
accordingly, we are using this controller as our application entry point:
Setup CoreData
If you checked “Use Core Data” when creating the project, Xcode adds a .xcdatamodeld
file that we will use to create our model.
You can create your own model, here I choose a very simple structure since my objectif
is just to get the data and focus more on OCR, but you can definitely change it for your country’s license plates structure.
Here I will only use the City
class as my main class for storing my license plate data, id
will stand for the suffix number and name
for the city name.
Add a model
Core Date will generate the model using CRUD methods associated with the class. You just need to create an NSManagedObject
subclass and follow the steps. It’s pretty simple and straightforward:
Populate the Model with Data Helpers
I have created a file with an extension to my ViewController
class so that I can populate the Database
with all the license plates (If you are Moroccan you can use the one I’ve made, otherwise create one and fill it with the necessary information).
You only need to run this function once, otherwise, you’ll end up with a lot of duplicates.
Setup the context
I removed the context from the AppDelegate.swift
where Xcode put it in the first place, so I created a class called PersistenceManager
. I’m not going to go deep into the details, you can definitely find a YouTube tutorial on how to do it, but what you need to know is that the context helps us fetch the data from the database so we can manipulate it as full objects. Here’s my PersistenceManager
class:
Set Up a UICollectionView — ViewController.swift
Create and setup the collection’s layout
First, instantiate a UICollectionView
object and cellId
object:
Then set up the layout and add delegates:
We also need a CollectionViewCell
, I’ve created a custom one that you can find in the Github repository.
Populate the collection view
I prefer to use extensions here, so I’ve created a separate file:
Setup an UIImagePickerView — OcrViewController.swift
Get images from the image library
We need a button and some logic to trigger it:
Trigger the camera
We now need to set some logic. It’s very important to change the Info.plist
file and add a property that explains to the user why we need access to the camera and the library, add some text to the Privacy — Camera Usage Description
:
Set up the delegate
Now we can access the image from the ImagePickerController
:
Testing Three OCR Libraries
Now let’s get to the fun part, OCR!
When it comes to image processing, it’s important to find an image that is easy to process and, more importantly, focused on the subject. Let’s assume that the images are in good quality and already cropped to our interest area.
Since we are making an iOS application, there’s no point in using my old friend OpenCV
because it’s not meant to be used in an ARM architecture and won’t be optimized to process complex series of convolutions and transformations (I’m sure that if we could have leveraged OpenCV the results would have been way better).
Here are the images I used to compare the three libraries:
TesseractOCRiOS
This library is interesting because it uses the legacy library Tesseract, which is written in C++ and wrapped to work in Swift. I was confident that it would give me a decent result, well I was wrong!
Let’s dive in, first you need to set up your pod
file and get your workspace up and running, I’m going to assume that you know how to use pods (There’s a lot of YouTube tutorials that can help you with cocoapods
).
There’s a GitHub repository if you want to check it out: TesseractOCRiOS
I’ve created a function with a completion block that returns a string:
Well let me be very clear, this one is that worst of all the three. I tried pretty much everything, even taking the images and performing some convolution on my computer to improve the image, but nothing! It’s just not good.
This library will only perform well if the pictures are full black text with pretty simple fonts and with a white background. I also want to point out is that Tesseract is known to be bad with images that contain large text.
SwiftOCR
SwiftOCR is a fast and simple OCR library written in Swift. It uses a neural network for image recognition. As of now, SwiftOCR is optimized for recognizing short, one line long alphanumeric codes (e.g. DI4C9CM). Source: Github
That’s interesting because it’s a different approach then Tesseract where you basically leverage classified and trained data to do the recognition.
SwiftOCR uses neural networks so it should be way better, let’s try it!
So I’ve created a function with a completion block that returns a string:
It can’t be easier than that just four lines of code.
I can 100% confirm that it’s way better than Tesseract, especially on a small block of text. I would say It gave me 20% of the characters but isn’t consistent at all, there’s a huge gradient between each iteration. I’m not sure if I did something wrong or if it’s just unstable.
So to conclude, SwiftOCR is better than Tesseract for this use case, but it’s nowhere near to the expected result. At this point, I am totally depressed (kidding). The results are far from what I need in order to process it and compare it with my database.
SwiftOCR is better but still not good enough!
Google MLVision — Firebase ML Kit
When you go to Google that means that you have reached your limits.
Vision is hard, my little experience with image processing taught me this. You need a huge amount of resources and very specialized crafting to achieve something decent.
You need to download the on-device text recognition. Apparently, it’s not as powerful as the cloud-based one, but I can tell you that it smashed the other libraries (like really smashed).
So, I’ve created a function with a completion block that returns a string:
I’m going to be very straightforward, Google is great period:
95% percent full text, 98% accuracy.
Final result
Google always wins
You can’t be mad at the other libraries because it’s just not possible to achieve what Google can do. They have been in the game for a long time and invested a huge amount of money and resources into this.
Everyone knows that Vision is the most challenging field of Artificial Intelligence and Google is doing its part.
If only other companies like Facebook, Microsoft, and Amazon would share a small part of their knowledge by open sourcing some of their tools.
Anyway, here’s the final result:
If you liked this tutorial, please clap and share it with your friends. If you have any questions don’t hesitate to send me an email at omarmhaimdat@gmail.com.
This project is available to download from my Github account