Optical Character Recognition (OCR) with EasyOCR |PyTorch

Düzgün İlaslan
6 min readAug 20, 2023

--

link

The work we will do in this article will be very useful for us. Most of the time, when we read a book or see a long text in a picture, we wish we could translate it into text. You receive a photo and there is a very long text in it, and you are asked to convert it into text as an e-mail and send it. How hard and long work, right? I’m going to talk about OCR (Optical Character Recognition) in search of a solution on thousands of examples that I talked about and didn’t even think of.

The process of converting all images with numbers or text to text sounds great. In this blog, we will analyze and export the texts in the pictures we have. We will develop an end-to-end code. Now let’s try to understand the situation.

The history of optical character recognition

In 1974, Ray Kurzweil started Kurzweil Computer Products, Inc., whose omni-font optical character recognition (OCR) product could recognize text printed in virtually any font. He decided that the best application of this technology would be a machine-learning device for the blind, so he created a reading machine that could read text aloud in a text-to-speech format. In 1980, Kurzweil sold his company to Xerox, which was interested in further commercializing paper-to-computer text conversion.(IBM)

OCR technology became popular in the early 1990s while digitizing historical newspapers. Since then, the technology has undergone several improvements. Today’s solutions have the abilitiy to deliver near-to-perfect OCR accuracy. Advanced methods are used to automate complex document-processing workflows. Before OCR technology was available, the only option to digitally format documents was to manually retype the text. Not only was this time-consuming, but it also came with inevitable inaccuracies and typing errors. Today, OCR services are widely available to the public. For example, Google Cloud Vision OCR is used to scan and store documents on your smartphone.

Now let’s get into practice and see how this exciting thing works.

We will implement this application using the EasyOCR library.

The EasyOCR package is created and maintained by Jaided AI, a company that specializes in Optical Character Recognition services.

EasyOCR is implemented using Python and the PyTorch library. If you have a CUDA-capable GPU, the underlying PyTorch deep learning library can speed up your text detection and OCR speed tremendously.

As of this writing, EasyOCR can OCR text in 58 languages, including English, German, Hindi, Russian, and more! The EasyOCR maintainers plan to add additional languages in the future.

Currently, EasyOCR only supports OCR’ing typed text. Later in 2020 they plan on releasing a handwriting recognition model as well!

okay let’s code

My versions are;

python == 3.9.12
matplotlib==3.7.1
numpy==1.25.0
Opencv==4.8.0

and install the Easyocr and PyTorch

conda install pytorch::pytorch torchvision torchaudio -c pytorch

install EasyOCR

pip install easyocr

folder tree

let’s install libraries and dependencies

import cv2
import easyocr
from matplotlib import pyplot as plt
import numpy as np

declare image path

IMAGE_PATH = ("images/coffee.jpeg")

create a easyocr model with language is English

Easyocr currently supporting 80+ languages and expanding.

reader = easyocr.Reader(['en'], gpu=True)

now, detect the text

result = reader.readtext(IMAGE_PATH)
result

As you can see, it found the text “COFFEE” in the picture with 99% accuracy.

There are several properties in result. The first of these [[214, 1136], [878, 1136], [878, 1330], [214, 1330]] indicates the frame of the picture. The ‘COFFEE’ part in the output is the text obtained from the image. 0.9996 represents the accuracy rate.

now get min x,y max x,y and text for draw the detected text

x_min, y_min = [int(min(cord)) for cord in zip(*result[0][0])]
x_max, y_max = [int(max(cord)) for cord in zip(*result[0][0])]
text = result[0][1]
font = cv2.FONT_HERSHEY_SIMPLEX

show the detected text

image = cv2.imread("images/coffee.jpeg")
image = cv2.rectangle(image, (x_min,y_min),(x_max,y_max),(0,255,0),3)
image = cv2.putText(image, text, (x_min, y_min),font, 2, (255, 255, 255),3, cv2.LINE_AA)
fig, ax = plt.subplots(figsize = (15,15))
ax.imshow(image)
ax.axis('off')
plt.show()

Here we first determined the picture, then we defined the x,y min, max values we got from the “result” for the rectangle to be drawn. We set the color of the rectangle as (0,255,0) green and the thickness as “3”. We chose the x, y min point of the position in the text. We chose the color (255,255,255) as white.

Well, what are we going to do with images with multiple text?

let’s deal with it now

#declare image path
IMAGE_PATH = ("images/wineCoffeeImage.png")
reader = easyocr.Reader(['en'], gpu=True)
result = reader.readtext(IMAGE_PATH)

As you can see, it detected four different words. These are “wine”, “coffee”, “O’clock”, “time”. each of them also has rectangular points and prediction rates.

Now, plot the all text in image

#get image
image = cv2.imread("images/wineCoffeeImage.png")
for detection in result:
#get min max coordinations
x_min, y_min = [int(cord) for cord in detection[0][0]]
x_max, y_max = [int(cord) for cord in detection[0][2]]
#get text
text = detection[1]
# declare the font
font = cv2.FONT_HERSHEY_SIMPLEX
# draw rectangles
image = cv2.rectangle(image, (x_min,y_min),(x_max,y_max),(0,255,0),2)
# put the texts
image = cv2.putText(image, text, (x_min, y_min),font, 1, (255, 25, 200),1, cv2.LINE_AA)
# plot the image
fig, ax = plt.subplots(figsize = (15,15))
ax.imshow(image)
ax.axis('off')
plt.show()

That’s it

As you can see, it’s not that difficult. On the contrary, this work is quite entertaining. Where you will use this model we have made is up to your imagination. You can turn it into a web app or a mobile app if you want.

Sources

--

--