Enhancing OCR Efficiency with Digital Image Processing

Published in

crossML Blog

4 min readNov 25, 2022

Digital formats are used to store and transmit information. However, there are a few situations where it would be required to separate the text from those in order to store it digitally. The most recent technology, such text recognition software, has completely changed the method of text extraction using optical character recognition.

Text extraction from images has numerous applications:

Identification cards for verification: Identity card recognition and information extraction in airports,banks and examination hall etc.

Extracting financial information: Data entry for business documents such as checks, passports, invoices, bank statements, and receipts.

Text-based searching: Search for text in electronic images of printed documents, such as Google Books.Real-time conversion of handwritten text such as Pen Computing.

Vehicle number plate recognition: Encourages a wide range of applications such as fast-track payment at toll plazas, vehicle tracking, automated parking systems, and road safety systems.

Shortcoming of current open source OCR systems:

Most open-source OCR system can recognize any typed document with black text on a white background. The main challenge is:

Extracting text from distorted images.
Images with different background colors.
Images with noise(random noise, fixed pattern noise, and banding noise).

Improved Text Extraction using Digital Image Processing

The proposed methodology is a smart enough approach based on digital image processing to improve the accuracy of text extraction from low and high resolution images using open source OCR.

The methodology consists of two phases:

Detection Phase
Correction Phase

Detection Phase

Using contour detection, get the angle of the images. Use the wrap perspective transformation to rotate the image until the angle with respect to the x-axis is 0.
There are two distinct strategies for correcting low- and high-resolution images in the pipeline. Every image is evaluated for a quality assessment, and the algorithm determines which approach to rectification will be used based on the quality score.
The image will be categorized as low resolution if the quality score is less than the proposed value of quality score; otherwise, it will be categorized as high resolution [Table 1].

Analyze the blurriness of high-resolution images using convolution operations. If the Blur Index value is less than the proposed Blur Index value(BI), apply the Laplacian filter.[Table 2]

Analyze the presence of noise in high-resolution images using mean and median filters along with some specific preprocessing operations.If the noise index value exceeds the proposed noise index value(NI),apply the de-noise function.[Table 3]

Correction Phase

A built-in computer vision package has been used to correct low-resolution images (Pan card, voter ID, driving license, etc.).
Certain morphological operations applied to the images with some static kernel values and structural elements to remove distortions from high resolution images, such as overexposed or underexposed areas; sharpness levels, Intensity/Brightness levels, homogeneity, and PSNR equalization (disc, oval, etc.).

The Design of Methodology

Result

The performance of the proposed methodology was evaluated using two parameters: Word Count(the number of words extracted from images via Open OCR) and Confidence Score (the average accuracy per document)[Fig 2,3].

Conclusion

The purpose of the design methodology is to match the accuracy of open-source OCR with that already developed various Paid source OCR available in the market.
The proposed methodology includes the fundamental components of a digital image processing system that modifies images to improve their quality (enhancement, restoration). It has also prompted advancements in modern OCR technology and real-world applications based on that.
The methodology was tested on 1500 images, including ID cards, invoices, and resumes.
The proposed methodology increases the word count and confidence score by 20%(approx), according to the comparison bar graph.
The team is working to develop additional features as per the market requirements.