Check Digitization using Deep Learning

8 min readSep 14, 2021

A payment Cheque is a document, written and signed by a customer, instructing a bank to debit your account and pay another person or organization. Cheques are scanned along with other business documents such as contracts, remittance, claim payment advices and invoices. Cheques are a preferred mode of payment worldwide as they are secure and easy.

In this article, we will be sharing an approach to extract critical fields — Payee Name, Payee Address, Bank Name, Cheque date, Cheque Amount, Routing Number, Check Number & Account Number using Amazon Textract as OCR engine and deep learning based object detection algorithm to detect various regions of Cheque.

Check Digitization using image processing is a powerful enabler to automate the processing of bank Cheques with minimal human intervention. It can play critical roles in streamlining document-intensive processes and office automation in many financial, accounting and taxation areas.

Various components of Cheques can be seen in figure below:-

Challenges

· Scanning artifacts such as Small fonts, Noisy images, Faded images
· Multiple Cheque can be present on a single scanned image
· Variations in Cheque format as per banks
· Cheque amounts can be handwritten
· Account number (given in the MICR strip) can have spacing issues — e.g. can be merged with Routing number, or can have unusual white spaces
· OCR missing in recognizing some character due to Magnetic character / symbols and extracting text in wrong order on tilted Cheque
· Difficulty in extracting information from Logos, overlapping information or portions not properly scanned.

Logical Flowchart :-

Methodology

1. Image Preprocessing — Firstly the image needs to be enhanced as it may contain noise which can be eliminated using Gaussian Noise Filter or any other suitable noise removal filter can be used depending on input quality of image.
2. Object Detection — Object detection, in simple terms, is a method that is used to recognize and detect different objects present in an image or video and label them to classify these objects. Faster RCNN model can be explained in below steps :-

Take an input image and pass it to the ConvNet which returns feature maps for the image.
Apply Region Proposal Network (RPN) on these feature maps and get object proposals.
Apply ROI pooling layer to bring down all the proposals to the same size.
Finally, pass these proposals to a fully connected layer in order to classify any predict the bounding boxes for the image.

As the number of classes’ increases, the complexity of building a model to differentiate between all of these objects increases. One of the popular models used for classifying the region proposals is the support vector machine (SVM). Due to these reasons we chose Faster RCNN as the ROI detection algorithm for our solution design. We trained custom Faster RCNN model to identify critical ROI’s — Date, Amount, MICR Strip, Bank Logo, Customer Information (Address, Name),Cheque Number. For training a custom object detection label, we annotated 250 images on all these parameters (using online annotation like VoTT v1 ) and prepared a training .csv file which contains the label as well as bounding boxes for all classes :-

It’s not always the case that you will get the best results from last weight file. You need to evaluate them on the basis of the mAP(Mean Average Precision) score. The mean Average Precision or mAP score is calculated by taking the mean AP over all classes and/or overall IoU thresholds, depending on different detection challenges that exist .

Below is the tensor board loss graph of our object detection model :-

Using our custom trained model, we were able to detect ROI as shown below :-

3 ) If at least 1 micr strip is detected on a page, then pass the image through Textract OCR. Else, no Cheque is found on image.
4) Text Detection — Using Amazon Textract, you can easily extract text and data from images and any scanned documents that go beyond simple optical character recognition (OCR) to extract data from tables and forms. Amazon Textract operations return the location and geometry of items found on a document page. You can use geometry information to draw bounding boxes around detected items on a page.

SOUTHSTATE BANK 5515 Map InsurancE LLC www.southstatebank.com CLOSING TRUST 9007401 601 12TH STREET W. 9004 7 7 63-1403/631 BRADENTON,FL 34205 (941) 747-1871 07/27/21 PAY TO THE ORDER OF Title Insurance Company $ *****1,247.47 One Thousand Two Hundred Forty Seven and 47/100 DOLLARS TRUST ACCOUNT MEMO VOID APTEM 180 DAYS Underwriter Remittance 005515 :063114030 : 3001674   07/27/21 5515 CLOSING TRUST Title Insurance Company 1,247.47 BUYER (S) : Loan too LD Enterprises SELLER (S) : SouthState Bank, N.A. PRIOR over  1,247.47

5 ) Extract the intersecting Text from Textract and Object Detection ROI’s based on Intersection over union algorithm (IOU).

6) Custom tagger to detect entities :- Once we have extracted all the text present inside the bounding box detected through our model, next task is to identify fields like company names, amounts & date from respective regions. While most of generic fields like amount, numbers and date can be detected through NER models, we gave a shot to train custom tagger for detecting only Bank Name . For this we collated list of 5,248 US Bank names from online sources as shown below :-

Spacy provides an exceptionally efficient statistical system for NER in python, which can assign labels to groups of tokens which are contiguous. Apart from these default entities, spacy also gives us the liberty to add arbitrary classes to the NER model, by training the model to update it with newer trained examples. The major part is to create your custom entity data for the input text where the named entity is to be identified by the model during the testing period. Below are the sample format for Train Data :-

[“pinnacle title agency, inc. escrow account 32200 w county line road investors bank”,{‘entities’: [(100, 115, ‘bank’)]}]

In this dataset we are trying to predict the name of the Bank.

Note: spacy v3.1, however, no longer takes .json format and this has to be converted to their .spacy Format. Please refer Spacy documentation.

Training and Prediction Using CLI

1 .Creating the configuration:

Before doing the training you need to create the config file. To create custom config file follow Spacy Documentation

!python -m spacy init config — lang en — pipeline ner /content/ner_demo/configs/config.cfg –force

2. Training the Model

Once that’s done, you’re ready to train your model

At this point, you should have three files on hand: (1) the config.cfg file, (2) your training data in the .spacy format and (3) an evaluation dataset.

!python -m spacy train /content/ner_demo/configs/config.cfg — output /content/ner_demo/training/

3. Predict entity in testing data

Business logics and Validation rules

We have built custom business rules and NLP logics to validate extracted information for fields like Routing Number, Cheque Amount & Cheque Number.

1 )The 9 digit routing number must hold following condition * –

3(d1 + d4 + d7) + 7(d2 + d5 + d8) + (d3 + d6 + d9) mod 10 = 0 (d refers to digit no. of routing number.

*ABA routing transit number — Wikipedia

2) Cheque Number identified from ROI should be present in MICR Code.
3) Cheque Amount identified from ROI should match with their word format present on Cheque.

If model has detected $1247.47 as Cheque amount , then its word format like “One Thousand Two Hundred Forty Seven” should be present as string in check raw data extraction. Snippet of code is as follows :-

Conclusion

We evaluated our model on 380 images and able to detect Cheque in 362 samples. Out of these, we were able to extract information with ~90% efficiency. As we move ahead with enhancing our models with more training data, results are expected to improve over time. I hope this made it easier for you to start your own deep learning object detection project.

About Me

I am a passionate machine learning enthusiast currently exploring the vast and exciting field of Deep Learning and NLP. Presently working as Senior Data Scientist at EXL Digital, my work involves building analytics models and deploying them on cloud native platform.

I would also give contributions to my colleagues — Divyansh Sharma and Deelip Kumar for their efforts in this project. You can follow us :-

Paras Ghai — Connect on LinkedIn
Divyansh Sharma — Connect on LinkedIn
Deelip Kumar — Connect on LinkedIn

Check Digitization using Deep Learning

Conclusion

About Me

Resources

Written by Parasghai