Automating OCR processes towards translations

Luis Carlos Manrique Ruiz
The Zeals Tech Blog
4 min readFeb 27, 2022

Background

As a foreigner who is learning the Japanese language, sometimes I can’t understand or read some documents due to my limitations on grammar and Chinese characters (kanji). Occasionally, digital documents are copyright protected, and then it is even harder to get or translate that information.

So, I created a system that allows me to take a screenshot, visualize the information in a browser and translate it automatically.

So there are 3 basic steps:

  1. Recognize the characters (OCR)
  2. Translate the document or documents into the English language.
  3. Export the results from the OCR and translate it into a specific folder.

The difference with Google Translate

The intention of Google Translate in a smartphone is to provide reliable and fast information in different languages, so people can use it on the go.

However, if it is a long document (an important one with several amounts of pages) it would be better to take time and read it carefully (even if it is translated). By documents, I’m referring to ones from the city hall, nurseries, companies, etc.; in other words, documents that contain relevant information.

This app will allow you to be able to compare and translate such long documents in an easy way via your computer.

Process

Figure 1. General description of this project

Previously a Zeals’ member introduced tesseract [Link] (Figure 1, green area). However, for our study, we will integrate tesseract with Python and create an API + GUI to interact with this system.

Let’s go into details:

Hardware setup

The photos are taken with a smartphone Xiaomi Mi 9.

This project is written on a laptop with the following OS: macOS Monterey v.12.2.1

Image preprocessing

Loading libraries

There are multiple libraries where we can read images such as OpenCV (cv2), Pillow (PIL) among others. But for now, we will load images by using the Pillow library.

import matplotlib.pyplot as plt
from PIL import Image

And we will use matplotlib for visualizing intermediate results too.

Reading image

Let’s read the image:

image = Image.open(filename)i, (im1) = plt.subplots(1)
i.set_figwidth(15)
im1.imshow(image)

Image preprocessing

According to various documentation; it is recommended to convert the image into grayscale in order to get more accurate results from tesseract. The argument ‘L’ converts the original image into grayscale.

# Converting into grayscale
gray_image = image.convert("L")
gray_image.show()

In Figure 2. we show the comparison from color image into a gray-scaled image.

Figure 2. Converting document into grayscale

Character recognition

Before running tesseract we need to be sure that we have the trained file of the Japanese language inside tesseract’s folder.

This file can be downloaded from here: [Link]

In order to do the exercise with the reader, some pages were downloaded from different Japanese sources and printed out.

The following code allows the application to recognize the Japanese characters from the gray-scaled image.

import pytesseractresult_pre_g = pytesseract.image_to_string(gray_image, lang=’Japanese’)
result_image_g = result_pre_g.replace(‘ ‘, ‘’)
print(result_image_g)

Results

The following table contains the result of different files.

Figure 2. Comparison with and without image preprocessing.

Further improvements on image preprocessing can be done, such as correcting orientation, intensity, and so forth. However, it is seen that performing image preprocessing improves accuracy.

Translating information

To translate the digitized information from the gray-scaled image, we will use another library called textblob, as follows. Please notice that the argument for the function “translate” is set to English. However, it may be set to any language provided by Google Translator’s API.

from textblob import TextBlob
tb = TextBlob(result_image_g)
translated = tb.translate(to=”en”)
Figure 3. Results from translation

In Figure 3 we show some results from the translation.

Creating GUI + API

For the following part, we will use a library called Streamlit.

Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In just a few minutes you can build and deploy powerful data apps.

Information was taken from this website.

Let’s update our code to add some features for displaying the information and downloading results.

You can execute the code as follows:

Figure 4. Executing code using Streamlit

The final results will look like this:

Figure 5. Results from OCR, Translations and GUI

Please notice that at the end of the translated file, a link is generated and you can download the translation. In figure 6 you can see the generated file.

Figure 6. Exporting file

Future

Then, that’s all folks! I hope you enjoyed learning about how I built this tool and I hope it will be useful for all your activities. I’d like to thank Prof. X for introducing me to Streamlit and Bismo for his time and comments while writing this article.

--

--