Faded, Torn, Rotated Receipt OCR with Image Preprocessing

9 min readMar 16, 2023

Faded, Torn, Rotated Receipt OCR with Image Preprocessing

How to extract specific information from faded, torn, rotated receipts?

Background

There are several Optical Character Recognition (OCR) engines available which include Tesseract OCR, Google Vision AI, and ML Kit (iOS). Mobile apps such as Cam Scanner, Receipt Jar, ReceiptLense seem to have incorporated superior OCR engines. However, mobile scanned receipts pose unique challenges in comparison to large scanner scanned receipts, documents, or books.

Mobile scanned receipts are often creased, faded, torn, rotated, stained, and have other impure conditions. Therefore, scanned receipts OCR workflow often incorporates input from the human user to provide better quality scans, to apply filters to enhance the input image quality, and to drag the four-corners of the receipt. After this manual help, texts are fairly algined and are ready to be processed by the OCR engine.

Introduction

In this blog, we will explore ways to automatically enhance scanned receipt’s readability by applying some image preprocessing techniques. Disclaimer – This will not be a perfect solution, but it will definitely help improve the quality of the OCR results. We will use Tesseract OCR model without image preprocessing as our baseline. And, we will compare the result with the preprocessed version.

Topics Covered

Tesseract OCR
OpenCV
Package Installation with automated script (.sh)
Image Augmentation
Rotate, Gray, Blur, Dilate, Canny, Binarize, Crop
Edge Detection
Bounding Box
Read, Write, View Image
Baseline vs Enhanced OCR Output Comparison
[System Used] Terminal, VS Code, Intel x86_64 macOS

Tesseract OCR

--psm Page Segmentation Mode (0–13). Option 1 worked the best for our scanned receipt example.

tesseract --help-psm

-l eng Language option (English)
The results are fairly good for a scanned receipt with high quality.
normal.JPG — input image path
normal — output text file name

tesseract normal.JPG normal --psm 1 -l eng

Good Quality Receipt & OCR Output Example

Baseline Grocery Receipt — Run OCR Without Preprocessing

The OCR result of this faded grocery receipt is very bad…! It’s not at all readable. Let’s see if we can improve this with the preprocessing steps in the next section.

tesseract faded.JPG faded --psm 1 -l eng

Faded receipt OCR results are really bad. Not at all readable.

Enhanced Grocery Receipt — Run OCR With Preprocessing

1. Install Numpy, OpenCV, imutils, Pillow

Create and activate a virtual environment. Check my previous post to learn how to do this correctly in VS Code with a specific Python version.
Create a script (install_packages.sh)

python3.8 -m pip install --upgrade pip
python3 -m pip install --upgrade Pillow

pip3 install autopep8
pip3 install numpy
pip3 install opencv-python
pip3 install imutils
brew install pillow

Execute script — This is a better approach for installing required packages than doing it from requirements.txt. to avoid package conflicts.

chmod +x install_packages.sh
sh install_packages.sh

2. Load Image

raw_img JPG/PNG image path is converted to Numpy array.
cv2.imread() OpenCV reads image in BGR scheme by default.
cv2.cvtColor() To view the image in RGB scheme.

import cv2
import imutils
import matplotlib.pyplot as plt
from PIL import Image

raw_path = 'faded.JPG' # Enter the path to your scanned receipt
raw_img = cv2.imread(raw_path)

# View the image in RGB
raw_rgb = cv2.cvtColor(raw_img, cv2.COLOR_BGR2RGB)
plt.imshow(raw_rgb)
plt.show()

3. Orient Receipt to Vertical (Optional)

Sometimes, the loaded image is in landscape orientation. If it is, we correct this first.

Orientation — Image orientation information is stored in metadata of the image file. However, Python libraries we use to read the image may not have access to the metadata. That’s why sometimes we see mis-oriented images.
(height, width, channels) — OpenCV opened image with .shape is in this format. Pillow opened image with .size is in (width, height) format.
imutils.rotate() — This function is compatible with OpenCV opened images, but not with Pillow opened images. This is becuase it reuqires shape information.

def orient_vertical(img):
    width = img.shape[1]
    height = img.shape[0]
  
    if width > height:
      rotated = imutils.rotate(img, angle=270)
    else:
      rotated = img
    return rotated

roated = orient_vertical(raw_img)

Example of Landscape (left) image reoriented to Portrait (right)

4. Sharpen Receipt Edge — Gray, Blur, Dilate, Canny

To detect the contour of the receipt, first we need to binarize the image. To binarize, we should convert the image to grayscale (Gray) and remove all the texts and other small objects from the scanned receipt (Blur, Dilate). Then, we sharpen the edges to easily detect the receipt contour (Canny).

Gray — To convert pixel values from BGR/RGB colour representation to brightness representation. It is achieved by converting pixel values from3-channels (0–255, 0–255, 0–255) to 1-channel (0–255).
Blur — To remove fine objects (i.e. text) from the scanned receipt so that only the receipt shape will be detected as a contour.
Dilate — To further remove fine objects (i.e. text) from the scanned receipt. Dilation is used to expand the boundaries of an object in an image, or to fill in small gaps or holes within an object.
Canny — To sharpen the edge of the receipt.

CAUTION: White background — If preprocessing step includes dilation, the receipt contour will become undetectable. White background will result in an incorrectly cropped receipt — In turn, poor OCR result. Error handing is required if deployed to production to catch failed cropping and zero character output of OCR.

def sharpen_edge(img):
    gray = cv2.cvtColor(rotated, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (15, 15), 0)
    rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (11, 11))
    dilated = cv2.dilate(blurred, rectKernel, iterations=2)
    edged = cv2.Canny(dilated, 75, 200, apertureSize=3)
    return edged

edged = sharpen_edge(rotated)

Blurred (left), Dilated (middle), Edged (right) Receipt

5. Binarize (Black & White)

threshold — Apply threshold to reassign each pixel value to 0 or 255. If threshold is set to 100, any pixels between 0–100 will become black (0), and any pixels between 100–255 will become white (255).
rectKernel — To thicken the edges of the receipt, increase the kernel size. Choose (15,15) instead of (1,1).
thresh This is the actual threshold used for cv2.threshold() operation. This should be identical as threshold value.
binary This is the output. Binarized image in numpy array format.

def binarize(img, threshold):
    threshold = np.mean(img)
    thresh, binary = cv2.threshold(img, threshold, 255, cv2.THRESH_BINARY)
    rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15, 15))
    dilated = cv2.dilate(binary, rectKernel, iterations=2)
    return dilated

threshold = 100
binarize(edged, threshold)

Kernel Size Comparison for Dilation: (2,2) left, (3,3) middle, (15,15) right. As the kernel size increases, the edges of the receipt become thicker.

6. Draw Bounding box

rect (cx, cy), (w, h) , a
box (top-left, top-right, bottom-right, bottom-left)
Largest Contour —To select based on the area, which will account for both closed and open contours (cv2.contourArea).

def find_receipt_bounding_box(binary, img):
    global largest_cnt
    contours, hierarchy = cv2.findContours(
        binary, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
    largest_cnt = max(contours, key=cv2.contourArea)
    rect = cv2.minAreaRect(largest_cnt)
    box = np.intp(cv2.boxPoints(rect))
    boxed = cv2.drawContours(img.copy(), [box], 0, (0, 255, 0), 20)
    return boxed

boxed = find_receipt_bounding_box(binary, rotated)
boxed_rgb = cv2.cvtColor(boxed, cv2.COLOR_BGR2RGB)
plt.imshow(boxed_rgb)
plt.show()

Convert from Rect as Tuple (top) to Box as List (bottom)

7. Adjust Tilted Receipt

Tilted Angle — The concept of angle here is tricky. The starting point of 0 angle is unclear, but adding 90 degree when the angles is < -45 degree seem to correctly adjust the tilted receipt.

def find_tilt_angle(largest_contour):
    angle = rect[2]  # Find the angle of vertical line
    print("Angle_0 = ", round(angle, 1))

    if angle < -45:
        angle += 90
        print("Angle_1:", round(angle, 1))
    else:
        uniform_angle = abs(angle)
    print("Uniform angle = ", round(uniform_angle, 1))

    return rect, uniform_angle

rect, angle = find_tilt_angle(largest_cnt)

Vertical edge is tilted at 89.2 degree from the West direction.

def adjust_tilt(img, angle):
    if angle >= 5 and angle < 80:
        rotated_angle = 0
    elif angle < 5:
        rotated_angle = angle
    else:
        rotated_angle = 270+angle

    tilt_adjusted = imutils.rotate(img, rotated_angle)
    delta = 360-rotated_angle
    return tilt_adjusted, delta


tilted, delta = adjust_tilt(boxed, angle)
print(delta)

tilted_rgb = cv2.cvtColor(tilted, cv2.COLOR_BGR2RGB)
plt.imshow(tilted_rgb, cmap='gray')
plt.show()

0.7866973876953125

Slightly Rotated to the Right by 0.787 Degree

8. Crop Receipt

def crop(img, largest_contour):
    x, y, w, h = cv2.boundingRect(largest_contour)
    cropped = img[y:y+h, x:x+w]
    return cropped

cropped = crop(tilted, largest_cnt)
plt.imshow(cropped)
plt.show()

9. Enhance Text on Receipt

Average brightness np.mean(ROI)— Using average brightness as a threshold value is a good starting point. So, here we calculated the mean of gray-scale pixel values and adjusted it by taking 98% of that value as threshold for binarization (You can experiment with different values). Gamma correction is similar in approach except that it uses log-scale instead.
Region of Interest (ROI) — To account for the brightness only within the receipt area, we will take 95% of the cropped image to calculate the average brightness mentioned above.

def enhance_txt(img):
    w = img.shape[1]
    h = img.shape[0]
    w1 = int(w*0.05)
    w2 = int(w*0.95)
    h1 = int(h*0.05)
    h2 = int(h*0.95)
    ROI = img[h1:h2, w1:w2]  # 95% of center of the image
    threshold = np.mean(ROI) * 0.98  # % of average brightness

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (1, 1), 0)
    edged = 255 - cv2.Canny(blurred, 100, 150, apertureSize=7)

    thresh, binary = cv2.threshold(blurred, threshold, 255, cv2.THRESH_BINARY)
    return binary

enhanced = enhance_txt(cropped)
enhanced_rgb = cv2.cvtColor(enhanced, cv2.COLOR_BGR2RGB)
plt.imsave('enhanced.jpg', enhanced_rgb)

Run OCR — Preprocessed

Overall, preprocessed images showed huge improvement over the baseline. The number of words detected, the number of comprehensible words, and the visual recognition of the text have been enhanced a lot. Now, we can extract information below:

Individual grocery items
Store name
Total price
Date of purchase
Card number

1. Faded #1 ⭐️⭐️

Improved in terms of extracted word count, but these are mostly incomprehensible words. Store name, purchase date, some individual item pricing are captured.

2. Faded #2 ⭐️⭐️⭐️ ⭐️

Improved a lot overall in terms of not only for the extracted word count, but also for the correctness of the extracted words. Now, we can read some individual grocery item, the total price, the card number, the date of purchase, and the store name.

3. Torn Receipt / Crumbled ⭐️⭐️

Improved a bit. Same as the faded receipt #1 example above. However, it’s worth to note that the poor result is not only about the torn condition of the receipt itself. It’s a combination of a lot of things including lighting, focus, crumbliness, etc. So, it is very likely that if I try another torn receipt, it might give me a very good outcome – Just like the faded receipt #2 gave us a much better result than the faded receipt #1 above.

4. Tilted / Crumbled ⭐️⭐️⭐️⭐️

Improved a lot. Same as above #2.

Conclusion

We successfully improved the Tesseract OCR result with image preprocessing steps. We used OpenCV, Pillow, imutils libraries to augment the scanned receipts. Then, we detected the edges of the receipt by starting with operations such as Rotate, Gray, Blur, Dilate, Canny, Binarize, Draw Bounding Box, and Crop. Then, we enhanced the text content of the receipt by obtaining the average brightness of the centre area of the receipt. We also learned the difference between OpenCV and Pillow library when it comes to reading the image shape and the compatibility with imutils package.

Next Step

In my next post, I will show you how to deploy this entire enhanced OCR preprocessing & model pipeline to Flask App and further to GCP micro service – Google Cloud Run – using Docker.