Augment OCR (Optical Character Recognition) with AI application for improved functionality
Are you a Data Scientist? Do you develop Computer Vision applications? If so, very often you would develop deep learning based models. Sometimes our application also needs to read the text from images. In this case, we need some augmented functionality. This functionality will make the application more attractive and more user-oriented. Thus, integrating an OCR in our application is simple and effective.
In this post, we use OpenCV, a Python-based text detection algorithm. The notebook implementation is also available.
What is OCR?
I can hear you saying, “What is OCR?” Optical Character Recognition (OCR) is a technique to read a text from images.
Why use OCR in AI applications?
Let’s consider an example. Suppose we are designing an application to identify the brand of a rum bottle. Any deep learning classifier will be able to detect the brand from the image. Now, if we have to further identify the flavor (usually written below the brand name), it may be a daunting task. Don’t worry, brace yourselves to use OCR.
Let’s take another example. We all know this famous scene from the movie Wall-E where a machine meets another advanced robot. He then asks her name. She replies, “My name is EVE.” He struggles a few times to understand her name. That scene embedded the cuteness even from extraterrestrial robot.
Whenever we introduce ourselves, people try to read our ID badge if we are wearing it. Now, Imagine our protagonist robot recognizing Eve’s name by reading her ID card. Interesting, isn’t it? What is further interesting is that we can teach a robot to read text from an image in three simple steps. Without further ado, let’s get started.
3 Simple steps:
Task: Let’s help our robot read text from the ID card image. We can do so in 3 simple steps:
- Text localization
- Post processing
- Text recognition
The first step (Text localization) is to find where a text is present in the image. In this case, the main text we are trying to read is the name “EVE”. There are many text localization algorithms. We will use an algorithm called EAST detector.
Using this algorithm we get several bounding boxes as shown in the picture. The second step is to post process the output bounding boxes. Then, get few most probable ones which has text in it. It has two steps:
- Non-Maxima Suppression (NMS)
- False positives removal
NMS is a technique to filter out the most probable bounding boxes from the list of several proposals.
For e.g., in the output image, we get several rectangles around the name “EVE”. NMS suppresses these bounding boxes to get only one. Thus, NMS suppresses many bounding boxes to single most probable rectangle. Simple but effective, right?
The next step is to remove the false detections. We can see that there are some false detections here. The false detections are the ones on which there is no text. Remove these false detections using some simple heuristics.
The simplest heuristic used here is the rule that in a text bounding box, width > height. i.e., assume horizontal alignment of text. The remaining bounding boxes are the ones where some meaningful text exists. Our final output image has a few bounding boxes around the text. Now we are ready to recognize these texts and to find if it has any names in it.
The third step is to recognize the text. Do this using tesseract-OCR provided by Google. More details here. We can further process the text output that we get from tesseract OCR.
Those interested in the python implementation of the above steps, please check here.
We have so far seen 3 simple steps to integrate OCR into your AI application. The integration of OCR in commercial products has seen a dramatic rise in recent years.
If you enjoyed this article, please connect with me via LinkedIn. Please leave a clap. It motivates me use my spare time to spread the knowledge.