Full tutorial using different libraries — TesseractOCRiOS, SwiftOCR, and Google MLVision | Can we beat Google?

Omar M’Haimdat
Jul 21 · 7 min read
Final Result, Github

It’s interesting to see how far we’ve come when it comes to character recognition technologies. Reading and identifying text inside a clean image of a scripted paper has become easy, but what about real-life scenarios where the lighting is bad and the image is crooked.

Well, it’s still surprisingly hard to read and recognize characters and that’s because Vision, in general, is complicated and expensive in terms of computation. There are also so many parameters that go into the pre-processing of an image before even trying to grasp any useful information from a given image.

When you try to process an image from a phone it becomes harder due to the fact you can’t leverage any GPU acceleration to speed up the process.

Let’s dive into the world of vision for iOS devices, for the purpose of this tutorial, we will be reading license plates and identifying the city it’s registered in. Our application will be called “Where Are You From?”


Create the App Skeleton

Basic setup

Let’s start by creating an iOS project with a single view app, make sure to check Use Core Data:

Create a new project with core data

Create View Controllers

We need four View Controllers:

  • ViewController:

This is where we will set our CollectionView with all the license plates (Moroccan license plate in this case) and with the corresponding city.

  • OcrViewController:

In this controller, we will add two buttons: one for accessing the photo library and the other one for accessing the camera.

  • WhatCityViewController:

Here we will display the actual image with the corresponding city.

  • MainTabViewController:

This is our main navigation where we will create full navigation for our application.

Create a navigation

Navigation in iOS is pretty straight-forward and easy to implement, I’ve changed some things like the font for the navigation bar, etc. so it does look nice. Here’s the full source code for the navigation, you can still find all this material in the Github repository:

MainTabViewController.swift

Make sure to change AppDelegate.swift accordingly, we are using this controller as our application entry point:

AppDelegate.swift

Setup CoreData

If you checked “Use Core Data” when creating the project, Xcode adds a .xcdatamodeld file that we will use to create our model.

.xcdatamodeld file

You can create your own model, here I choose a very simple structure since my objectif is just to get the data and focus more on OCR, but you can definitely change it for your country’s license plates structure.

Here I will only use the City class as my main class for storing my license plate data, id will stand for the suffix number and name for the city name.

Add a model

Core Date will generate the model using CRUD methods associated with the class. You just need to create an NSManagedObject subclass and follow the steps. It’s pretty simple and straightforward:

Create NSManagedObject Subclass

Populate the Model with Data Helpers

I have created a file with an extension to my ViewController class so that I can populate the Database with all the license plates (If you are Moroccan you can use the one I’ve made, otherwise create one and fill it with the necessary information).

Example of how I instantiated my City class, I did this for 87 cities in the CreateCities.swift file

You only need to run this function once, otherwise, you’ll end up with a lot of duplicates.

Setup the context

I removed the context from the AppDelegate.swift where Xcode put it in the first place, so I created a class called PersistenceManager. I’m not going to go deep into the details, you can definitely find a YouTube tutorial on how to do it, but what you need to know is that the context helps us fetch the data from the database so we can manipulate it as full objects. Here’s my PersistenceManager class:

Persistence Manager Class

Set Up a UICollectionView — ViewController.swift

Create and setup the collection’s layout

First, instantiate a UICollectionView object and cellId object:

Our Collection View

Then set up the layout and add delegates:

We also need a CollectionViewCell, I’ve created a custom one that you can find in the Github repository.

Populate the collection view

I prefer to use extensions here, so I’ve created a separate file:


Setup an UIImagePickerView — OcrViewController.swift

Get images from the image library

We need a button and some logic to trigger it:

Trigger the camera

We now need to set some logic. It’s very important to change the Info.plist file and add a property that explains to the user why we need access to the camera and the library, add some text to the Privacy — Camera Usage Description:

Set up the delegate

Now we can access the image from the ImagePickerController:

Image Picker Delegate

Testing Three OCR Libraries

Now let’s get to the fun part, OCR!

When it comes to image processing, it’s important to find an image that is easy to process and, more importantly, focused on the subject. Let’s assume that the images are in good quality and already cropped to our interest area.

Since we are making an iOS application, there’s no point in using my old friend OpenCV because it’s not meant to be used in an ARM architecture and won’t be optimized to process complex series of convolutions and transformations (I’m sure that if we could have leveraged OpenCV the results would have been way better).

Here are the images I used to compare the three libraries:

Three images of Moroccan License Plates. Disclaimer: I found these images on google image engine and only used for testing purposes, I don’t know if they are real or not.

TesseractOCRiOS

This library is interesting because it uses the legacy library Tesseract, which is written in C++ and wrapped to work in Swift. I was confident that it would give me a decent result, well I was wrong!

Let’s dive in, first you need to set up your pod file and get your workspace up and running, I’m going to assume that you know how to use pods (There’s a lot of YouTube tutorials that can help you with cocoapods).

There’s a GitHub repository if you want to check it out: TesseractOCRiOS

I’ve created a function with a completion block that returns a string:

Tesseract OCR for iOS

Well let me be very clear, this one is that worst of all the three. I tried pretty much everything, even taking the images and performing some convolution on my computer to improve the image, but nothing! It’s just not good.

This library will only perform well if the pictures are full black text with pretty simple fonts and with a white background. I also want to point out is that Tesseract is known to be bad with images that contain large text.

SwiftOCR

SwiftOCR is a fast and simple OCR library written in Swift. It uses a neural network for image recognition. As of now, SwiftOCR is optimized for recognizing short, one line long alphanumeric codes (e.g. DI4C9CM). Source: Github

That’s interesting because it’s a different approach then Tesseract where you basically leverage classified and trained data to do the recognition.

SwiftOCR uses neural networks so it should be way better, let’s try it!

So I’ve created a function with a completion block that returns a string:

It can’t be easier than that just four lines of code.

I can 100% confirm that it’s way better than Tesseract, especially on a small block of text. I would say It gave me 20% of the characters but isn’t consistent at all, there’s a huge gradient between each iteration. I’m not sure if I did something wrong or if it’s just unstable.

So to conclude, SwiftOCR is better than Tesseract for this use case, but it’s nowhere near to the expected result. At this point, I am totally depressed (kidding). The results are far from what I need in order to process it and compare it with my database.

SwiftOCR is better but still not good enough!

Google MLVision — Firebase ML Kit

When you go to Google that means that you have reached your limits.

Vision is hard, my little experience with image processing taught me this. You need a huge amount of resources and very specialized crafting to achieve something decent.

You need to download the on-device text recognition. Apparently, it’s not as powerful as the cloud-based one, but I can tell you that it smashed the other libraries (like really smashed).

So, I’ve created a function with a completion block that returns a string:

I’m going to be very straightforward, Google is great period:

95% percent full text, 98% accuracy.


Final result

Google always wins

You can’t be mad at the other libraries because it’s just not possible to achieve what Google can do. They have been in the game for a long time and invested a huge amount of money and resources into this.

Everyone knows that Vision is the most challenging field of Artificial Intelligence and Google is doing its part.

If only other companies like Facebook, Microsoft, and Amazon would share a small part of their knowledge by open sourcing some of their tools.

Anyway, here’s the final result:

Final result

If you liked this tutorial, please clap and share it with your friends. If you have any questions don’t hesitate to send me an email at omarmhaimdat@gmail.com.

This project is available to download from my Github account

Download the project

Better Programming

Advice for programmers.

Omar M’Haimdat

Written by

Software Engineering Student.

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade