Unleashing the Kraken for OCR

Published in

Analytics Vidhya

3 min readApr 6, 2020

OCR Engine for Most languages

Now a days machine are trained to understand image, video, voice etc which in turn has accelerated in solving problems like object detection, facial recognition, autonomous driving, surveillance, activity recognition and many more.

Handwritten Text Documents(HTD) have been there from ages and how would machine understand them as we have plethora of documents right from Ancient Scriptures, Banks, Medical Documents, … So Cutting the Story Short … HOW TO DIGITIZE THEM... So say Hi to “Kraken”

Source: https://ya-webdesign.com/imgdownload.html (official Logo: http://kraken.re/_static/kraken.png)

Kraken: is a turn-key OCR system forked from ocropus. It is intended to rectify a number of issues while preserving (mostly) functional equivalence. Available under Apache2.0 license.

Its Features:

Script detection and multi-script recognition support(literally i will show some examples and share the Code)
Right-to-Left, BiDi, and Top-to-Bottom script support
ALTO, abbyXML, and hOCR output
Word bounding boxes and character cuts
Public repository of model files
Lightweight model files
Variable recognition network architectures

Installation :

kraken requires some external libraries to run. On Debian/Ubuntu they may be installed using:

# apt install libpangocairo-1.0 libxml2 libblas3 liblapack3 python3-dev python3-pip

Simple way: (pip also works)

pip3 install kraken

After installation it Looks Like this:

Now Import and See what is has

So now you are ready to Fireup the Kraken on Handwritten Text Document.

QuickStart:

Output:

As in India we have so many(22+) Vernacular Languages, so thought why not to test some of them.

Here we Go:

Hindi:

Telugu:

Sample Code:

SSusantAchary/kraken_ocr_engine

“Fork the github repo “ ; so that you will be notified with new updates !!!
So having Seen the above stuff, we can definitely say that provided the volumes of Documents kept in Libraries etc. , can be definitely digitized.

Keep Learning and Exploring

Source :

kraken - kraken 2.0.5-4-gbb42ba5 documentation

kraken is a turn-key OCR system forked from ocropus. It is intended to rectify a number of issues while preserving…

kraken.re

Images from Internet