Automating Invoice Data Extraction with Deep Learning

Published in

Re-Invent with Digital Business Transformation

4 min readNov 5, 2020

“In any architecture, there is the equity between the pragmatic function and the symbolic function” — Michael Graves

What is ‘Deep Learning’?

Deep Structured Learning or Deep Learning is a part of the broader family of machine learning methods based on artificial neural networks. Deep learning is a class of machine learning algorithms that makes use of feature extraction and can be applied to image recognition. Most modern deep learning models are based on artificial neural networks (ANN). An artificial neural network consists of a collection of simulated neurons conceptually like biological neurons and can be used to mimic how the human brain operates. With adequate training, an ANN can accomplish tasks such as recognizing an object in an image.

Deep Learning for Invoice Data Extraction

Invoices raised by companies vary in terms of size, layout, font, and other such visual attributes. Not only do the invoice fields change but aren’t necessarily in a certain order or format. OCR has been applied successfully to solve the challenge of automated invoice processing. While this works great in an environment where the variety of invoices is low, it can surely have its drawbacks for larger organizations due to the following reasons:

Invoices must be classified manually which is difficult and quite time-consuming.
Extractors need to be pre-trained and that requires ground-truth sample documents that are similar to the ones that will be processed in the live environment.
Different extraction engines will be required to process a class of invoice documents. These systems make use of templates which in turn heavily make use of positional vectors for key-value data extraction.
OCR extraction systems cannot handle changes in layout and structural changes to the invoices. The extractor will have to be re-trained and templates updated.

All the above points translate to a lot of effort and expense if there is a great deal of variety and volume in the invoices to be processed in an OCR system alone.

How is Deep Learning relevant to Intelligent Data Extraction?

The real value of Deep learning when applied to invoice data extraction is to recognize and classify invoice documents accurately and facilitate the redirection of the document processing stage to the associated extractor of the OCR system.

It can also inform the support team about changes in structure and layout to the invoices and prevent errors from flowing to downstream processes. A deep learning invoice classification sub-system acts as a pre-processing agent of the OCR invoice-data extraction system to improve its overall accuracy and efficiency.

How does it work?

Object Recognition can be broken down into tasks that are performed by the ANN algorithm. Let’s discuss these tasks at a much broader level:

Image Classification: involves predicting the class of an object in an image or a document. The image under processing is assigned one label from a fixed set of categories. This is done by applying feature extraction on a database of images fed to the algorithm for learning purposes beforehand.
Object Localization: involves identifying the location of one or more objects and their extent may be delineated by boundary boxes
Object Detection: Combines the above two approaches to provide information (class/ category) on an object in the image and locate every instance of that object.

The more the trained data, the more variety of documents your algorithm is trained to classify.

With deep learning technology, image corrections are amplified even more accurately. More so, the algorithms can be made to easily detect images and text even when the images are blurry or are of low contrast or the images are scanned on uneven surfaces. Isn’t that great?

Summary

In principle, text classification and identification can be done using Deep Learning Neural Networks as described above. But the text is different due to the complexity of decoding semantic information.

In fact, deep learning can be applied for the entire invoice data extraction even without the OCR system and without the need of any structured templates, but the only setback is that it is not business viable as of today.

‘Business viable’ is not just about cost but also about applicability, reliability, accuracy, training, and support. Every business needs to preconfigure validation rules, insert accounting codes, and perform processing activities that are separate from data extraction.

This reconfigurability is easier to achieve with an OCR system making use of flexible templates to extract key-value pairs as integrations to third-party applications are readily available using Web APIs, web-services, or web-hooks.

We might see advanced invoice data extraction systems soon that work on grounds of deep learning and NLP models, but a practical assessment will probably get you to evaluate a much more reliable OCR-based data extraction system with enhanced facilities to incorporate NLP and automation using RPA.

Learn more about how Intelligent Document Extraction benefits your organization in a multitude of ways.