Techniques for enhancing OCR accuracy by assessing image sharpness

Published in

Data Science at Microsoft

8 min readMay 14, 2024

Optical Character Recognition (OCR) technology has revolutionized the way we digitize text from images. However, one persistent challenge that OCR systems face is accurately deciphering text from blurry images. Blurriness can significantly degrade OCR accuracy, leading to misinterpretations and errors in the extracted text. In this article, I present the issue of blurriness in images for OCR applications and explore three techniques to assess image sharpness, thereby enhancing OCR accuracy.

What is blurriness in images?

Blurriness in images occurs due to various factors such as motion blur, defocus blur, or low image resolution. These factors can distort the fine details and edges of text, making it difficult for OCR systems to accurately recognize and extract characters.

How does OCR benefit businesses?

OCR is widely used by businesses for the following applications, among many others:

Document digitization: Converting printed or handwritten documents, forms, and receipts into digital text.
Data extraction: In industries such as finance, retail, and healthcare, OCR automates the extraction of data or key information from forms, price tags, and prescriptions.
Accessibility and assistive technology: Text recognition software converts printed text into audible speech or Braille output, enabling visually impaired users to access and interact with textual content.

Why is the accuracy of OCR models with our data so important?

Evaluating models on blurred images gives a false representation of the accuracy of the model. Accuracy is of paramount importance due to the following:

Reliability of data: High accuracy ensures that the extracted text reflects the original content accurately, allowing businesses to rely on the OCR output for data analysis, decision-making, and regulatory compliance.
Minimization of errors: Inaccurate OCR results can lead to errors in data entry, document processing, and information retrieval, which may lead to financial losses, compliance violations, reputational damage, and legal liabilities.
Efficiency: Eliminating the need for manual verification and correction of OCR output saves time and resources for businesses.

Adopting OCR technology is likely to generate additional revenue for many businesses. It also results in cost savings through better resource utilization, reducing the risk of fines, penalties, and legal liabilities associated with data errors or non-compliance.

Techniques for assessing image sharpness

On a sample of 200 cropped images containing text, I ran an experiment to determine the approach that gives the best metric to determine the sharpness of an image. The sample images included some blurred and grainy images too. Here, I have explored three techniques to identify blurriness in images using OpenCV: Laplacian Operator, Gradient Magnitude Method, and Fast Fourier Transformations. To compare different technologies, I have added the sharpest and blurriest images as references for each technique.

In this section, I explore the following techniques in detail.

Laplacian Operator method

The Laplacian operator is a classic technique used in image processing to detect edges and assess image sharpness. It calculates the second derivative of an image to detect edges and assess image sharpness.

Mathematically, the Laplacian operator ∇² is defined as the divergence of the gradient of the image function f (x, y), where (x, y) represents the spatial coordinates of the image:

The Laplacian operator measures the local curvature of the image intensity function at each pixel. High positive values indicate regions of rapid intensity change (edges), while low values indicate smooth regions.

By calculating the variance of the Laplacian operator, we can quantify the sharpness of the image. Higher variance values indicate sharper edges and more pronounced features.

Here is Python code to implement the Laplacian operator:

On application of this technique to the sample dataset, we can observe below that the images in Figure 1 have the lowest sharpness score while appearing clearer, unlike the images in Figure 2 with a higher sharpness score when they are actually more distorted.

*Figure 1: Lowest sharpness score with Laplacian operator.*

*Figure 2: Highest sharpness score with Laplacian operator.*

Sensitivity to noise limitation: In noisy images, the Laplacian operator produces inaccurate sharpness scores as it may detect noise as edges, resulting in higher variance and misleading sharpness measurements. As a result, sharp and blurred images are mostly misclassified.

Gradient Magnitude method

Edge detection is the process of finding edges in an image that reveal structure information about the image. Sobel filters are a type of edge detection filter commonly used in image processing. They are often used to detect edges in an image by computing the gradient magnitude.

The gradient represents the rate of change in pixel intensity in the image. The gradient magnitude is the magnitude of this gradient vector, which indicates how rapidly the pixel intensities change from one point to another in the image.

Mathematically, the gradient of an image function f (x, y) represents the rate of change of pixel intensities with respect to spatial coordinates (x, y):

The magnitude of the gradient ∥∇f (x, y)∥ is calculated as the Euclidean norm of the gradient vector:

This magnitude represents the rate of change of pixel intensities in all directions at each pixel. High magnitude values indicate regions of rapid intensity change, such as edges and textures, while low values indicate smooth regions.

Then we calculate the mean (average) value of all these gradient magnitudes across the entire image. This mean gradient magnitude provides an overall measure of the image’s sharpness or the presence of edges. Higher mean or sum values indicate sharper edges and more pronounced features.

Here is Python code to implement the Sobel operator:

On application of this technique to the sample dataset, we can observe below that images in Figure 3 that have the lowest sharpness score are correctly identified, unlike images in Figure 4 that have a higher sharpness score when they are actually distorted.

*Figure 3: Lowest sharpness score using Gradient Magnitude.*

*Figure 4: Highest sharpness score using Gradient Magnitude.*

Limitation: In images with uniform regions or low-contrast features, the gradient magnitude may yield lower magnitude values, leading to underestimation of sharpness. Conversely, in images with complex textures or high-contrast edges, the gradient magnitude may produce higher magnitude values, even if the image is not necessarily sharp overall.

Fast Fourier Transformation method

The Fourier Transformation is a mathematical technique that is used for analyzing the frequency components of an image.

The Fourier Transform F(u, v) of an image function f (x, y) is defined as:

where (u, v) represents the spatial frequency coordinates in the frequency domain.

The magnitude spectrum |F(u, v)| of the Fourier Transform represents the distribution of spatial frequencies present in the image. High magnitude values correspond to high-frequency components, such as edges and textures, while low values correspond to low-frequency components, such as smooth regions.

To assess image sharpness using the Fast Fourier Transformation (FFT) method, we typically analyze the magnitude spectrum and calculate relevant statistics, such as mean, to quantify the distribution of high-frequency content across the image. Higher statistics values indicate sharper edges and more pronounced features.

Here is Python code to implement the Fourier Transformation:

On application of this technique to the sample dataset, we can observe below that images in Figure 5 are correctly identified as blurred with the lowest sharpness score while images in Figure 6 are correctly identified as sharp with a high sharpness score.

*Figure 5: Lowest sharpness Score with FFT.*

*Figure 6: Highest sharpness score with FFT.*

Which method worked best on our business problem?

On the sample of images used, the Fast Fourier transformations worked best due to:

Sensitivity to high-frequency components: The Fast Fourier Transform is highly sensitive to high-frequency components, making it particularly effective for detecting sharp image features. High-frequency components contribute significantly to the magnitude spectrum of the Fourier Transform, allowing for precise identification and quantification of image sharpness.
Shift invariance property: Regardless of the position or orientation of the image, the Fourier Transform captures the frequency content consistently, making it a robust and reliable method for assessing image sharpness.

How to determine the threshold?

The threshold serves as a cutoff point for distinguishing between sharp and blurry images based on their sharpness scores. The ideal threshold for determining whether text in an image is sharp or blurry depends on various factors such as the quality of the images, the characteristics of the text (font size, style, contrast with background), and the specific requirements of the application. Images with sharpness scores above the threshold are considered sharp, while those below the threshold are deemed blurry. Adjusting the threshold value allows for flexibility in determining the acceptable level of sharpness for OCR processing.

Some of the methods to determine threshold include:

Visual inspection: Trying to find a threshold value by inspecting images in ways that separate the sharp from the blurry ones.
Empirical testing: Testing with a set of images with ground truth labels indicating whether the text is sharp or blurry.
Iterative refinement: Starting with a conservative threshold and then adjusting the threshold based on performance.

Conclusion

In conclusion, blurriness in images poses a significant challenge for OCR systems. By leveraging the basic principles of image processing and incorporating thresholding techniques, the Fourier Transformation method provides a reliable approach for evaluating image sharpness, thereby enhancing OCR accuracy and performance.

Monica Kadlay is on LinkedIn.