EasyOCR: A Comprehensive Guide

Aditya Mahajan
11 min readOct 28, 2023

--

Table of Contents

  1. Introduction
  2. Components
  3. Installation
  4. Basic Usage
  5. Advanced Usage
  6. Real-world Applications
  7. Conclusion

Introduction

EasyOCR is a Python computer language Optical Character Recognition (OCR) module that is both flexible and easy to use. OCR technology is useful for a variety of tasks, including data entry automation and image analysis. It enables computers to identify and extract text from photographs or scanned documents.

EasyOCR stands out for its dedication to making OCR implementation easier for developers. It’s made to be user-friendly even for people with no background in OCR or computer vision. Multiple language support, pre-trained text detection and identification models, and a focus on speed and efficiency in word recognition inside images are all provided by the library.

EasyOCR is a dependable option for Python developers because of its versatility in handling typefaces and text layouts, as well as its focus on accuracy and speed. EasyOCR simplifies the process of extracting text from photos for use in various Python projects, including desktop software, online applications, and others. This frees up your time to concentrate on the unique requirements of your product.

EasyOCR Framework

You may easily automate text-related operations, enhance data extraction from scanned documents, and leverage text recognition capabilities in your image analysis projects by incorporating EasyOCR into your Python programs. It’s a useful tool for computer vision applications and an approachable way to use OCR in your Python projects.

Components

EasyOCR consists of three components, The three main components of EasyOCR, are feature extraction, sequence labeling, and decoding. In order to extract useful features from the input image, deep learning models such as ResNet and VGG are used in feature extraction. These characteristics are essential for text recognition in pictures. Sequence labeling, the next step, uses Long Short-Term Memory (LSTM) networks to interpret the extracted features’ sequential context. Text pattern recognition and structuring are crucial tasks for LSTM networks. Finally, the decoding part decodes and transcribes the labeled sequences into the actual recognized text using the Connectionist Temporal Classification (CTC) algorithm. These three elements function as a unit to allow EasyOCR to reliably and effectively extract text from images.The training pipeline is based on the deep-text-recognition-benchmark framework, which enhances text recognition in images and offers a strong basis for OCR execution.

1. Feature Extraction (Resnet and VGG):

The recognition model’s first step is feature extraction. In order to create a set of features that can be utilized for additional analysis, the input data must be converted. With EasyOCR, VGG and Resnet are used for this.

Resnet, also known as Residual Networks, is a kind of convolutional neural network (CNN) that bypasses certain layers by using shortcuts or skip connections. This makes it possible for the network to be deeper and still be trainable by resolving the issue of vanishing gradients. Learning the residual representation functions rather than the signal representation directly is the fundamental tenet of the Resnet architecture. This lowers the model’s complexity and facilitates the network’s learning process.

An additional kind of CNN is called Visual Geometry Group, or VGG. Its uniform architecture and simplicity are well-known. Its homogeneous architecture consists of very small (3x3) convolution filters stacked deeper and deeper on top of one another. This simplifies the network’s understanding and modification by lowering the number of computation and parameters.

2. Sequence Labeling (LSTM):

Sequence labeling comes next after feature extraction. Long Short-Term Memory (LSTM) networks are used for this.

Recurrent neural networks (RNNs) of the long-sequence learning and memory type, or LSTMs, are able to model the temporal dependencies of multiple time steps. The distinct design of LSTM, in contrast to conventional RNNs, aids in preventing the vanishing gradient issue. The memory cell and the three different kinds of gates — input, forget, and output — allow it to accomplish this. These elements give LSTM great efficacy for sequence labeling tasks by enabling it to add or remove information from the cell state over extended sequences.

3. Decoding (CTC):

Decodering is the last step in the recognition model, and Connectionist Temporal Classification (CTC) is used to do this.

A kind of loss function called CTC is applied to sequence issues in which time is uncertain. It is applied in situations where we are unsure of the alignment between the labels and the input data, which frequently occurs in speech and handwriting recognition. In order to provide a prediction with a variable length, CTC appends a blank label in addition to existing labels. It is then perfect for OCR decoding tasks because it computes loss by adding up all possible alignments of the input to the target sequences.

An altered version of the deep-text-recognition-benchmark architecture serves as the training pipeline for EasyOCR’s recognition execution. With the use of this framework, text recognition models can be trained on a variety of datasets, making EasyOCR incredibly versatile and effective.

Installation

Make sure you have PyTorch installed as it’s a core dependency for EasyOCR.

pip3 install torch torchvision torchaudio

If installing PyTorch by above command dosen’t work try traversing through the official site of PyTorch, as it has different versions for different OS.

EasyOCR can be installed using pip:

pip install easyocr

EasyOCR supports multiple languages, and you can install language models for the specific languages you need. By default, English (en) is installed. You can add additional languages with — model_dir or specify them in your code when initializing the EasyOCR reader.

For example, to install Chinese (chinese), you can use:

pip install easyocr[chinese]

Supported Languages By EasyOCR

EasyOCR is currently supporting 80+ languages with more languages in development. Supported languages with the code name are listed below:

Abaza (abq), Adyghe (ady), Afrikaans (af), Angika (ang), Arabic (ar), Assamese (as), Avar (ava), Azerbaijani (az), Belarusian (be), Bulgarian (bg) , Bihari (bh), Bhojpuri (bho), Bengali (bn), Bosnian (bs), Simplified Chinese (ch_sim), Traditional Chinese (ch_tra), Chechen (che), Czech (cs), Welsh (cy), Danish (da), German (de), English (en), Spanish (es), Estonian (et), Persian (fa), Finnish (fi), French (fr), Irish (ga), Goan Konkani (gom), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Ingush (inh), Icelandic (is), Italian (it), Japanese (ja), Kabardian (kbd), Kannada (kn), Korean (ko), Kurdish (ku), Latin (la), Lak (lbe), Lezghian (lez), Lithuanian (lt), Latvian (lv), Magahi (mah), Maithili (mai), Maori (mi), Mongolian (mn), Marathi (mr), Malay (ms), Maltese (mt), Nepali (ne), Newari (new), Dutch (nl), Norwegian (no), Occitan (oc), Pali (pi), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (cyrillic) (rs_cyrillic), Serbian(latin) (rs_latin), Nagpuri (sck), Slovak (sk), Slovenian (sl), Albanian (sq), Swedish (sv), Swahili (sw), Tamil (ta), Tabassaran (tab), Telugu (te), Thai (th), Tajik (tjk), Tagalog (tl), Turkish (tr), Uyghur (ug), Ukranian (uk), Urdu (ur), Uzbek (uz), Vietnamese (vi)

To install a specific language for EasyOCR, you need to pass the language code as an argument to the easyocr.Reader object. For example, if you want to use French, you can write:

import easyocr
reader = easyocr.Reader(['fr'])

You can also use multiple languages at once, as long as they are compatible with each other. For example, if you want to use English and Spanish, you can write:

import easyocr
reader = easyocr.Reader(['en', 'es'])

Basic Usage

Here’s a simple example of how to use EasyOCR to read an image and extract text:

  1. Import EasyOCR and create a reader object. You need to specify the languages you want to use as a list of language codes.
  2. Load an image file and perform OCR on it. You can use the readtext method of the reader object to get the text and its coordinates from the image.
  3. The .readtext method returns a list of results, where each result contains information about the recognized text, its bounding box, and the probability of accuracy. You can iterate through the results and access this information:
import easyocr

reader = easyocr.Reader(['en']) # specify the language
result = reader.readtext('image.jpg')

for (bbox, text, prob) in result:
print(f'Text: {text}, Probability: {prob}')

This script will print out the text detected in the image.

for example, using this code for

we get output as:

Advanced Usage

EasyOCR provides several advanced features, such as detecting multiple languages, detecting texts from videos, Automated form filling and more. Here are some advanced use cases of how to use EasyOCR :

Multi-Language Recognition: EasyOCR supports more than 80 languages. You can specify the languages while creating the reader object. For example, to recognize both English and Hindi text from an image:

import easyocr
reader = easyocr.Reader(['en','hi'])
result = reader.readtext('image.jpg')
for (bbox, text, prob) in result:
print(f'Text: {text}, Probability: {prob}')

Licence Plate Recognition: EasyOCR demonstrates its versatility by being exceptionally good at reading text off license plates, a task that is crucial for many applications, including security and traffic control. The library is a great option for license plate recognition because of its built-in capabilities, multi-language support, and sophisticated recognition methods. Optical Character Recognition (OCR) can be applied to the license plate image using EasyOCR once a clear picture of it is taken with a camera or smartphone. Developers can effectively include this feature into their apps because EasyOCR’s user-friendly API and wide language recognition capability make it simple to retrieve license plate information.

import easyocr
reader = easyocr.Reader(['en'])
result = reader.readtext('image.jpg')

for (bbox, text, prob) in result:
(top_left, top_right, bottom_right, bottom_left) = bbox
print(f'Text: {text}, Probability: {prob}')
Input
Output

Reading Text from Grayscale Images: EasyOCR can read text from grayscale images as well. You just need to convert your image to grayscale before passing it to the readtext method.

from PIL import Image
import easyocr
img = Image.open('image.jpg').convert('L')
reader = easyocr.Reader(['en'])
result = reader.readtext(img)
for (bbox, text, prob) in result:
print(f'Text: {text}, Probability: {prob}')

This shows that EasyOCR is also able to read from grayscale images, first a simple text image is converted to grayscale and then passes to EasyOCR reader to recognize.

Handling Noisy Images: If your image has a lot of noise, you can use image processing techniques to reduce the noise before passing the image to EasyOCR. Here cv2 library is used to reduce the noise and then that image is passed to EasyOCR reader to recognize.

import cv2
import easyocr
img = cv2.imread('noisy_image.jpg', 0)
blur = cv2.GaussianBlur(img,(5,5),0)
reader = easyocr.Reader(['en'])
result = reader.readtext(blur)
for (bbox, text, prob) in result:
print(f'Text: {text}, Probability: {prob}')

Batch Processing: If you have multiple images and you want to process them all at once, you can use the readtext method in a loop.

import easyocr
reader = easyocr.Reader(['en'])
images = ['image1.jpg', 'image2.jpg', 'image3.jpg']
for img in images:
result = reader.readtext(img)
print(result)

Getting Bounding Boxes: The readtext method returns the recognized text along with the bounding box coordinates. You can use these coordinates for further processing.

import easyocr
reader = easyocr.Reader(['en'])
result = reader.readtext('image.jpg')for res in result:
print(f"Text: {res[1]}, Coordinates: {res[0]}")

Real-world Applications

EasyOCR can be used in a variety of real-world applications, Below I have provided some raw solutions or approaches I can think of such as:

  • Document Digitization:With EasyOCR, text can be extracted from scanned or photographed images to easily convert physical documents into digital format. This makes document digitization easier, improves accessibility, and makes efficient storage and search possible. It’s an important tool for many uses, such as document management and historical document preservation. If the document is multiple pages long we can also use the batch processing for each page all at once, simply run a for loop and all the physical pages will be converted to digital in very less time.
  • Receipt and Invoice Data Extraction: Automating the extraction of invoice and receipt data is a strength of EasyOCR. By effectively extracting vendor names, dates, amounts, and other crucial information from receipts and invoices, it simplifies the process. This feature makes it easier to enter data and improves accuracy, which makes it a useful tool for accounting and finance applications.
  • License Plate Recognition: EasyOCR demonstrates its versatility by being exceptionally good at reading text off license plates, a task that is crucial for many applications, including security and traffic control. The library is a great option for license plate recognition because of its built-in capabilities, multi-language support, and sophisticated recognition methods.
  • Image Search: Search for images based on text content, Just like one of the funcnality of google lens , EasyOCR can be useful to search images based on text content i.e. if we have some specific text we can search multiple images that has same text by analyzing images on web and extracting text from those images and comparing them, or search a particular image by extracting text from that image and using that text to search web or the image.
  • Machine Translation: Translate text detected in images. By extracting and translating text from images, EasyOCR makes machine translation easier. It can translate text detected in images and other tasks, and its multi-language support and effective recognition make it a useful tool for overcoming language barriers in real-time. We can use EasyOCR to extract text and use some other python library like googletrans, TextBlob etc to translate it to any other required language
  • Text Detection in Videos: EasyOCR is a potent tool that works with dynamic media in addition to still images. It is a flexible option for a range of applications because it can detect text in videos in real time. With the help of this function, EasyOCR is able to recognize and extract the text contained in each frame of a video. EasyOCR can swiftly and precisely identify text, whether it be in the form of news broadcast text, movie subtitles, or signs in traffic videos. This creates opportunities for accessibility features, video content analysis, and much more.

Conclusion

This blog explains how to utilize EasyOCR, a strong and approachable OCR library that can recognize and extract text from a variety of image formats. We’ve seen how EasyOCR can analyze numerous photos in batch, handle noisy and grayscale images, support over 80 languages, and return bounding boxes and confidence scores for text that has been recognized. We have also learned how to use matplotlib and OpenCV to draw the results on the images. With its straightforward and efficient method for extracting text from photographs with a high degree of accuracy, EasyOCR is a great tool for text recognition from images. The library’s robust algorithms and user-friendly interface make it the perfect choice for companies and organizations in need of processing large volumes of documents and images quickly.

If you want to learn more about EasyOCR, you can check out its GitHub repository or its documentation page. Additionally, you can view a few of the lessons and examples that the community and developers have supplied. You are welcome to contribute to the growth and development of EasyOCR as it is an open-source project.

Happy coding! 😊

--

--