What is Optical Character Recognition(OCR)?

Atiar Rahman
Analytics Vidhya
Published in
4 min readSep 24, 2019

--

Have you ever tried to deposit a check electronically through your bank’s app? When doing this, you are usually directed to hover your phone over the front and back of your check until your phone decides the optimal angle to take a picture. Using these pictures, the bank recognizes the amount of money you are depositing, what account the money is coming from, who it’s coming from and on top of all this you’ve saved yourself time by not having to go to the bank. But what kind of sorcery is this?

This is done through OCR or Optical Character Recognition. OCR is what allows computers to recognize writing within images and extract that writing. Depositing checks electronically is just one of it’s many uses. Some other uses include checking yourself in at hotels, utility meter readings, and text editors.

But how exactly does OCR work? There are various OCR models, some more advanced than others, but in principle each character is being compared to a database of characters. The character in the database that most resembles the character being read is selected. The more advanced OCR models will not only look at characters as a whole, but the specific attributes such as curvature allowing them to distinguish characters that may look similar like a “1”(number one) or “l”(lower-case L). At the end, the model may use a spell checker or a dictionary to see if the words that are returned makes sense.

Now, let’s get into some of the nuts and bolts of this process. The first thing an OCR model does with an image is pre-processing. This can be a lot of different things depending on the model but essentially changes are made to the image to make it as easy to read as possible. Some examples of pre- processing are rotating the text, straightening the text, removing any background images, making the background white as possible and making the text dark if it isn’t dark already.

After doing this, the OCR begins the processing of recognizing each character and there are different approaches to doing this. One approach is pattern recognition. This works by first recognizing lines of black pixels with rows of white pixels in between. Individual characters are then looked at the same way by looking at the black pixels that are in between columns of white pixels.

Each character is converted into a binary matrix where the white pixels are 0s and the black pixels are ones. Using the distance formula, the distance between the center of the matrix and the furthest pixel is measured. This is used as a radius to create a circle and divides the circle into different subsections which are each compared to a database of characters to get the character that matches the best.

Python Libraries

Google has their own open source OCR engine called Tesseract that has the ability to recognize more than 100 languages. It can also be trained to recognize other languages. This engine can be used in Python by using the library python-tesseract which is a wrapper for Google’s OCR engine. More information on this library can be found here:

OCR and Deep Learning

The methods above were the traditional ways to perform OCR. However, with the progress of computer science and machine learning, OCR can now be done with deep learning. OCR models can now be trained to recognize characters. However, even with deep learning there are challenges because of the variety of characters caused by different fonts, languages, and of course poor handwriting. There are different deep learning approaches to handle this including EAST and CRNN. These methods can be implemented with tools like Tensorflow and Keras. This picture is a high-level example of OCR being done with deep learning.

LSTM is an RNN(recurrent neural network) standing for Long Short-Term Memory

In the example above LSTM is used which stands for Long Short-Term Memory. LSTM is an artificial recurrent neural network that has feedback connections and can process not only single data points, but sequences of data(speech and video). The following charts show an overview of how it may be used in an OCR model.

OCR Deep Learning Overview
Each bar is a probability vector, with the most probable character being selected

Summary

In short, OCR is another very cool application of computer science/machine learning that makes daily living easier. It has been around for quite some time but recent advances in machine learning have made it even more accurate.

Resources

The following resources were very helpful in the making of this post.

--

--