Automating OCR processes towards translations
Background
As a foreigner who is learning the Japanese language, sometimes I can’t understand or read some documents due to my limitations on grammar and Chinese characters (kanji). Occasionally, digital documents are copyright protected, and then it is even harder to get or translate that information.
So, I created a system that allows me to take a screenshot, visualize the information in a browser and translate it automatically.
So there are 3 basic steps:
- Recognize the characters (OCR)
- Translate the document or documents into the English language.
- Export the results from the OCR and translate it into a specific folder.
The difference with Google Translate
The intention of Google Translate in a smartphone is to provide reliable and fast information in different languages, so people can use it on the go.
However, if it is a long document (an important one with several amounts of pages) it would be better to take time and read it carefully (even if it is translated). By documents, I’m referring to ones from the city hall, nurseries, companies, etc.; in other words, documents that contain relevant information.
This app will allow you to be able to compare and translate such long documents in an easy way via your computer.
Process
Previously a Zeals’ member introduced tesseract [Link] (Figure 1, green area). However, for our study, we will integrate tesseract with Python and create an API + GUI to interact with this system.
Let’s go into details:
Hardware setup
The photos are taken with a smartphone Xiaomi Mi 9.
This project is written on a laptop with the following OS: macOS Monterey v.12.2.1
Image preprocessing
Loading libraries
There are multiple libraries where we can read images such as OpenCV (cv2), Pillow (PIL) among others. But for now, we will load images by using the Pillow library.
import matplotlib.pyplot as plt
from PIL import Image
And we will use matplotlib for visualizing intermediate results too.
Reading image
Let’s read the image:
image = Image.open(filename)i, (im1) = plt.subplots(1)
i.set_figwidth(15)
im1.imshow(image)
Image preprocessing
According to various documentation; it is recommended to convert the image into grayscale in order to get more accurate results from tesseract. The argument ‘L’ converts the original image into grayscale.
# Converting into grayscale
gray_image = image.convert("L")
gray_image.show()
In Figure 2. we show the comparison from color image into a gray-scaled image.
Character recognition
Before running tesseract we need to be sure that we have the trained file of the Japanese language inside tesseract’s folder.
This file can be downloaded from here: [Link]
In order to do the exercise with the reader, some pages were downloaded from different Japanese sources and printed out.
The following code allows the application to recognize the Japanese characters from the gray-scaled image.
import pytesseractresult_pre_g = pytesseract.image_to_string(gray_image, lang=’Japanese’)
result_image_g = result_pre_g.replace(‘ ‘, ‘’)
print(result_image_g)
Results
The following table contains the result of different files.
Further improvements on image preprocessing can be done, such as correcting orientation, intensity, and so forth. However, it is seen that performing image preprocessing improves accuracy.
Translating information
To translate the digitized information from the gray-scaled image, we will use another library called textblob, as follows. Please notice that the argument for the function “translate” is set to English. However, it may be set to any language provided by Google Translator’s API.
from textblob import TextBlob
tb = TextBlob(result_image_g)
translated = tb.translate(to=”en”)
In Figure 3 we show some results from the translation.
Creating GUI + API
For the following part, we will use a library called Streamlit.
Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In just a few minutes you can build and deploy powerful data apps.
Information was taken from this website.
Let’s update our code to add some features for displaying the information and downloading results.
You can execute the code as follows:
The final results will look like this:
Please notice that at the end of the translated file, a link is generated and you can download the translation. In figure 6 you can see the generated file.
Future
Then, that’s all folks! I hope you enjoyed learning about how I built this tool and I hope it will be useful for all your activities. I’d like to thank Prof. X for introducing me to Streamlit and Bismo for his time and comments while writing this article.