Faded, Torn, Rotated Receipt OCR with Image Preprocessing
How to extract specific information from faded, torn, rotated receipts?
Background
There are several Optical Character Recognition (OCR) engines available which include Tesseract OCR, Google Vision AI, and ML Kit (iOS). Mobile apps such as Cam Scanner, Receipt Jar, ReceiptLense seem to have incorporated superior OCR engines. However, mobile scanned receipts pose unique challenges in comparison to large scanner scanned receipts, documents, or books.
Mobile scanned receipts are often creased, faded, torn, rotated, stained, and have other impure conditions. Therefore, scanned receipts OCR workflow often incorporates input from the human user to provide better quality scans, to apply filters to enhance the input image quality, and to drag the four-corners of the receipt. After this manual help, texts are fairly algined and are ready to be processed by the OCR engine.
Introduction
In this blog, we will explore ways to automatically enhance scanned receipt’s readability by applying some image preprocessing techniques. Disclaimer – This will not be a perfect solution, but it will definitely help improve the quality of the OCR results. We will use Tesseract OCR model without image preprocessing as our baseline. And, we will compare the result with the preprocessed version.
Topics Covered
- Tesseract OCR
- OpenCV
- Package Installation with automated script (.sh)
- Image Augmentation
- Rotate, Gray, Blur, Dilate, Canny, Binarize, Crop
- Edge Detection
- Bounding Box
- Read, Write, View Image
- Baseline vs Enhanced OCR Output Comparison
- [System Used] Terminal, VS Code, Intel x86_64 macOS
Tesseract OCR
--psm
Page Segmentation Mode (0–13). Option 1 worked the best for our scanned receipt example.
tesseract --help-psm
-l eng
Language option (English)- The results are fairly good for a scanned receipt with high quality.
normal.JPG
— input image pathnormal
— output text file name
tesseract normal.JPG normal --psm 1 -l eng
Baseline Grocery Receipt — Run OCR Without Preprocessing
The OCR result of this faded grocery receipt is very bad…! It’s not at all readable. Let’s see if we can improve this with the preprocessing steps in the next section.
tesseract faded.JPG faded --psm 1 -l eng
Enhanced Grocery Receipt — Run OCR With Preprocessing
1. Install Numpy, OpenCV, imutils, Pillow
- Create and activate a virtual environment. Check my previous post to learn how to do this correctly in VS Code with a specific Python version.
- Create a script (install_packages.sh)
python3.8 -m pip install --upgrade pip
python3 -m pip install --upgrade Pillow
pip3 install autopep8
pip3 install numpy
pip3 install opencv-python
pip3 install imutils
brew install pillow
- Execute script — This is a better approach for installing required packages than doing it from
requirements.txt
. to avoid package conflicts.
chmod +x install_packages.sh
sh install_packages.sh
2. Load Image
raw_img
JPG/PNG image path is converted to Numpy array.cv2.imread()
OpenCV reads image in BGR scheme by default.cv2.cvtColor()
To view the image in RGB scheme.
import cv2
import imutils
import matplotlib.pyplot as plt
from PIL import Image
raw_path = 'faded.JPG' # Enter the path to your scanned receipt
raw_img = cv2.imread(raw_path)
# View the image in RGB
raw_rgb = cv2.cvtColor(raw_img, cv2.COLOR_BGR2RGB)
plt.imshow(raw_rgb)
plt.show()
3. Orient Receipt to Vertical (Optional)
Sometimes, the loaded image is in landscape orientation. If it is, we correct this first.
- Orientation — Image orientation information is stored in metadata of the image file. However, Python libraries we use to read the image may not have access to the metadata. That’s why sometimes we see mis-oriented images.
- (height, width, channels) — OpenCV opened image with
.shape
is in this format. Pillow opened image with.size
is in (width, height) format. - imutils.rotate() — This function is compatible with OpenCV opened images, but not with Pillow opened images. This is becuase it reuqires
shape
information.
def orient_vertical(img):
width = img.shape[1]
height = img.shape[0]
if width > height:
rotated = imutils.rotate(img, angle=270)
else:
rotated = img
return rotated
roated = orient_vertical(raw_img)
4. Sharpen Receipt Edge — Gray, Blur, Dilate, Canny
To detect the contour of the receipt, first we need to binarize the image. To binarize, we should convert the image to grayscale (Gray) and remove all the texts and other small objects from the scanned receipt (Blur, Dilate). Then, we sharpen the edges to easily detect the receipt contour (Canny).
- Gray — To convert pixel values from BGR/RGB colour representation to brightness representation. It is achieved by converting pixel values from
3-channels
(0–255, 0–255, 0–255) to1-channel
(0–255). - Blur — To remove fine objects (i.e. text) from the scanned receipt so that only the receipt shape will be detected as a contour.
- Dilate — To further remove fine objects (i.e. text) from the scanned receipt. Dilation is used to expand the boundaries of an object in an image, or to fill in small gaps or holes within an object.
- Canny — To sharpen the edge of the receipt.
CAUTION: White background — If preprocessing step includes dilation, the receipt contour will become undetectable. White background will result in an incorrectly cropped receipt — In turn, poor OCR result. Error handing is required if deployed to production to catch failed cropping and zero character output of OCR.
def sharpen_edge(img):
gray = cv2.cvtColor(rotated, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (15, 15), 0)
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (11, 11))
dilated = cv2.dilate(blurred, rectKernel, iterations=2)
edged = cv2.Canny(dilated, 75, 200, apertureSize=3)
return edged
edged = sharpen_edge(rotated)
5. Binarize (Black & White)
threshold
— Apply threshold to reassign each pixel value to 0 or 255. If threshold is set to 100, any pixels between 0–100 will become black (0), and any pixels between 100–255 will become white (255).rectKernel
— To thicken the edges of the receipt, increase the kernel size. Choose (15,15) instead of (1,1).thresh
This is the actual threshold used for cv2.threshold() operation. This should be identical asthreshold
value.binary
This is the output. Binarized image in numpy array format.
def binarize(img, threshold):
threshold = np.mean(img)
thresh, binary = cv2.threshold(img, threshold, 255, cv2.THRESH_BINARY)
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15, 15))
dilated = cv2.dilate(binary, rectKernel, iterations=2)
return dilated
threshold = 100
binarize(edged, threshold)
6. Draw Bounding box
rect
(cx, cy), (w, h) , abox
(top-left, top-right, bottom-right, bottom-left)- Largest Contour —To select based on the area, which will account for both closed and open contours (
cv2.contourArea
).
def find_receipt_bounding_box(binary, img):
global largest_cnt
contours, hierarchy = cv2.findContours(
binary, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
largest_cnt = max(contours, key=cv2.contourArea)
rect = cv2.minAreaRect(largest_cnt)
box = np.intp(cv2.boxPoints(rect))
boxed = cv2.drawContours(img.copy(), [box], 0, (0, 255, 0), 20)
return boxed
boxed = find_receipt_bounding_box(binary, rotated)
boxed_rgb = cv2.cvtColor(boxed, cv2.COLOR_BGR2RGB)
plt.imshow(boxed_rgb)
plt.show()
7. Adjust Tilted Receipt
- Tilted Angle — The concept of angle here is tricky. The starting point of 0 angle is unclear, but adding 90 degree when the angles is < -45 degree seem to correctly adjust the tilted receipt.
def find_tilt_angle(largest_contour):
angle = rect[2] # Find the angle of vertical line
print("Angle_0 = ", round(angle, 1))
if angle < -45:
angle += 90
print("Angle_1:", round(angle, 1))
else:
uniform_angle = abs(angle)
print("Uniform angle = ", round(uniform_angle, 1))
return rect, uniform_angle
rect, angle = find_tilt_angle(largest_cnt)
def adjust_tilt(img, angle):
if angle >= 5 and angle < 80:
rotated_angle = 0
elif angle < 5:
rotated_angle = angle
else:
rotated_angle = 270+angle
tilt_adjusted = imutils.rotate(img, rotated_angle)
delta = 360-rotated_angle
return tilt_adjusted, delta
tilted, delta = adjust_tilt(boxed, angle)
print(delta)
tilted_rgb = cv2.cvtColor(tilted, cv2.COLOR_BGR2RGB)
plt.imshow(tilted_rgb, cmap='gray')
plt.show()
0.7866973876953125
8. Crop Receipt
def crop(img, largest_contour):
x, y, w, h = cv2.boundingRect(largest_contour)
cropped = img[y:y+h, x:x+w]
return cropped
cropped = crop(tilted, largest_cnt)
plt.imshow(cropped)
plt.show()
9. Enhance Text on Receipt
- Average brightness
np.mean(ROI)
— Using average brightness as a threshold value is a good starting point. So, here we calculated the mean of gray-scale pixel values and adjusted it by taking 98% of that value as threshold for binarization (You can experiment with different values). Gamma correction is similar in approach except that it uses log-scale instead. - Region of Interest (ROI) — To account for the brightness only within the receipt area, we will take 95% of the cropped image to calculate the average brightness mentioned above.
def enhance_txt(img):
w = img.shape[1]
h = img.shape[0]
w1 = int(w*0.05)
w2 = int(w*0.95)
h1 = int(h*0.05)
h2 = int(h*0.95)
ROI = img[h1:h2, w1:w2] # 95% of center of the image
threshold = np.mean(ROI) * 0.98 # % of average brightness
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (1, 1), 0)
edged = 255 - cv2.Canny(blurred, 100, 150, apertureSize=7)
thresh, binary = cv2.threshold(blurred, threshold, 255, cv2.THRESH_BINARY)
return binary
enhanced = enhance_txt(cropped)
enhanced_rgb = cv2.cvtColor(enhanced, cv2.COLOR_BGR2RGB)
plt.imsave('enhanced.jpg', enhanced_rgb)
Run OCR — Preprocessed
Overall, preprocessed images showed huge improvement over the baseline. The number of words detected, the number of comprehensible words, and the visual recognition of the text have been enhanced a lot. Now, we can extract information below:
- Individual grocery items
- Store name
- Total price
- Date of purchase
- Card number
1. Faded #1 ⭐️⭐️
Improved in terms of extracted word count, but these are mostly incomprehensible words. Store name, purchase date, some individual item pricing are captured.
2. Faded #2 ⭐️⭐️⭐️ ⭐️
Improved a lot overall in terms of not only for the extracted word count, but also for the correctness of the extracted words. Now, we can read some individual grocery item, the total price, the card number, the date of purchase, and the store name.
3. Torn Receipt / Crumbled ⭐️⭐️
Improved a bit. Same as the faded receipt #1 example above. However, it’s worth to note that the poor result is not only about the torn condition of the receipt itself. It’s a combination of a lot of things including lighting, focus, crumbliness, etc. So, it is very likely that if I try another torn receipt, it might give me a very good outcome – Just like the faded receipt #2 gave us a much better result than the faded receipt #1 above.
4. Tilted / Crumbled ⭐️⭐️⭐️⭐️
Improved a lot. Same as above #2.
Conclusion
We successfully improved the Tesseract OCR result with image preprocessing steps. We used OpenCV, Pillow, imutils libraries to augment the scanned receipts. Then, we detected the edges of the receipt by starting with operations such as Rotate, Gray, Blur, Dilate, Canny, Binarize, Draw Bounding Box, and Crop. Then, we enhanced the text content of the receipt by obtaining the average brightness of the centre area of the receipt. We also learned the difference between OpenCV and Pillow library when it comes to reading the image shape and the compatibility with imutils package.
Next Step
In my next post, I will show you how to deploy this entire enhanced OCR preprocessing & model pipeline to Flask App and further to GCP micro service – Google Cloud Run – using Docker.
Source Code
- See the sample run in Kaggle notebook
- See the full code in GitHub