OCR — Optical Character Recognition

Karteek Menda
4 min readJul 10, 2021

--

@source: Wikipedia

The technology to convert images of typed, handwritten or printed text into machine-encoded text

Hello Aliens.

The mission of artificial intelligence(AI) is to assist humans in processing large amounts of data and to automate the routine tasks. This powerful data can put a business strategy on the right track. But before that someone needs to collect data from multiple sources. Managing documents involves many repetitive tasks and requires much of human effort. So, AI can automate this flow, reduce the processing time and thus save resources.

OCR

Optical Character Recognition(OCR) is the technology for automatic text recognition. We can translate printed, handwritten and scanned documents into a machine readable format. This technology relieves employees of manual entry of data, and reduce human errors.

Information in documents is usually a combination of natural language and semi-structured data in forms of tables, diagrams, symbols etc… A human can read and understand text regardless of its structure and the way it is represented. And due to natural language processing (NLP), computers can interact with written (as well as spoken) forms of human language.

In document recognition, NLP, coupled with OCR, finds applications for data retrieval, information extraction. We can extract everything of value for a company(information about consumers, payment data from invoices) from top to bottom in a document. It saves lot of time when you need to convert a lot of printed file or hard copy printed text to word document. Manually typing will take much time which could be efficiently reduced by such.

OCR is a set of very different, computing intense processes where a lot of mathematics, statistics and linguistic is involved.

lets take an example and look at how powerful this is.

Here, what we will be doing is, we will take an image which is of Hindi text. And we try to extract the text from the image.

The image I have chosen is

Source: omniglot.com

So, for this we need to install the below packages.

!pip install easyocr
!pip install googletrans
!pip install gTTS

Then we can create objects for easyocr and translator. And follow the below code.

reader = easyocr.Reader(['hi'])translator = Translator()

We can also check the bounding boxes by just introducing a function which is as follows and then we can tweak the parameters to get the best.

bounds = reader.readtext('download.png', add_margin=0.60, width_ths=0.50, link_threshold=0.60, decoder='beamsearch', blocklist='=-')def draw_boxes(image, bounds, color = 'yellow', width=1):
draw = ImageDraw.Draw(image)
for bound in bounds:
p0, p1, p2, p3 = bound[0]
draw.line([*p0, *p1, *p2, *p3, *p0],fill=color,width = width)
return image
draw_boxes(im, bounds)

So, We can tweak the parameters like add_margin, width_ths, link_threshold, decoder, blocklist etc… We have various other parameters which can be dealt with in order to get the job done efficiently. So, I would strongly recommend aliens to visit the github of JaidedAI to get further details.

Also, we can use google translator to translate the text which is obtained from the image to our understandable language. Also, I used google text to speech converter where the text is been converted to the language we specify over there. So, this can serve multipurpose.

सभी मनुष्य को गौरव और अधिकारों के  मामले में जन्मजात स्वतन्तता और समानता प्राप्त है उन्हेंबुद्धि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे वै पे बर्ताव  भाव र करना चाहिए।

The output we obtained is almost okay but a small tweaking of the parameters needs to be done which is been intentionally left to the aliens to sort it down. Lets have a look at what the translation for this “Hindi” text will be in “English”.

All human beings have inherent freedom and equality in terms of pride and rights, they have the wisdom and wisdom of the soul and they should treat each other with brotherhood.

Also, we have also made the translation available in mp3 format, so that it can serve someone who are visual impaired.

Besides this, we have many other open source OCR libraries. Out of which some are Tesseract OCR, CuneiForm OCR, Python pyocr, Yunmai OCR technology, ABBYY.

Google Tesseract OCR is super cool in handling these type of tasks. I would also recommend the aliens to just go through this. This was beautifully explained over here.

I will be coming up with “Sentimental Classification using BERT” in my upcoming article.

Please do follow me on Medium so that you can receive the updates.

Follow me on Linkedin: www.linkedin.com/in/karteek-menda

Happy Learning…….

Bye Aliens.

This is Karteek Menda.

Signing Off

--

--

Karteek Menda

Robotics GRAD Student at ASU, Project Engineer in Dynamic Systems and Control Lab at ASU, Ex - Machine Learning Engineer, Machine Learning Blogger.