Document AI — Custom Processors

Vasu Mittal
Google Cloud - Community
6 min readJul 16, 2024

In the last blog of Document AI blogs series, we covered details about what is Document AI, What are its key features, What are Document Processors and also saw General Processor in action with some evaluation metrics. In this blog we will cover Document AI — Custom Processors, their various types, modes and benefits in detail.

Customer Processors(Your Content, Your Model, Your Customizations): Customer Processors offers the capability to build models for your own document types. You can train custom models from scratch or up-train existing models without having to write any machine learning code.

Type of Custom Processors

Document AI supports the following 4 types of Custom Processors:

  1. Custom Extractors — Custom Extractors are used to identify & extract specific data from your documents. Custom Extractors extract entities from a document similar to the form or invoice parser. For ex: Extracting various chemical components along with their percentages from a medical report.
  2. Custom Classifiers — Custom Classifiers are used to group your documents into categories. Custom Classifier classifies documents into types, similar to the procurement and lending classifiers. A custom classifier can be used for moderating social media content, detecting fraudulent insurance claims or classifying loan applicatons(ex: Home loan, Vehicle Loan, Commercial Property Loan etc.)
  3. Custom Splitters — Custom Splitters identify document boundaries in a large file i.e they identify page split points in files with multiple documents. For ex: custom document splitter can be used to split archived documents into smaller, more manageable pieces before archiving them. In this case, custom document splitter will split the documents to be archived into various small document by categories so that in future if any specific details are required then those can be extracted quickly.
  4. Summarizer — As the name suggests, summarizer is used to generate summaries for short & long documents.

Why do we need a Custom Processor?

Fully customized processors are used when you have a type of document that isn’t similar to any of the existing specialized processors. You can create a new processor from scratch that allows you to extract entities using your own document types.

For ex: A medical test report, let’s say a “Blood Test Report” is a good usecase. These reports follow a similar structure, but there isn’t a specialized parser that can work with them. Hence, if we have to process this kind of document then, we can create a custom document extractor to recognize and extract the fields in these records using a completely unique entity schema.

You can create custom extractors that are specifically suited to your documents, and trained and evaluated with your data. This processor identifies and extracts entities from your documents. You can then use this trained processor on additional documents. Custom extractor extracts entities from documents of a particular type. For example, it can extract the items in a menu or the name and contact information from a resume.

The goal of the custom extractor is to enable Document AI users to build custom entity extraction solutions for new document types for which no pre-trained processors are available. Custom extractor includes a combination of layout-aware deep learning models (for generative AI and custom models) and template-based models.

The 4 main steps that we need to perform to train a custom processor are:-

  1. Define Fields & Schema.
  2. Label Documents & Train Model.
  3. Evaluate for desired performance.
  4. Deploy & Use.

Custom Extractor Modes(Training Methods)

As of this writing, Custom extractor supports a wide range of use cases with three different modes.

Source: https://cloud.google.com/document-ai/docs/custom-extractor-overview

Custom Extractor With Generative AI: Generative AI training and extraction lets you use zero-shot and few-shot technology to get a high performing model with little to no training data using the foundation model. It also allows you to use fine-tuning to further boost accuracy as you provide more and more training data.

Custom Model Based Extraction: Custom model training and extraction lets you to build your own model designed specifically for your documents without the use of generative AI. It’s ideal if you don’t want to use generative AI and want to control all aspects of the trained model.

Template Based Extraction: You can train a high-performing model with as little as three training and three test documents for fixed-layout use cases. You can accelerate your development and reduce time to production for templated document types like W9, 1040, ACORD, surveys, and questionnaires etc.

Custom Processor: Sample Usecase

For an example, let’s consider a publicly available dataset from Kaggle: SEC Edgar Annual Financial Filings. This is a publicly available dataset on Kaggle consisting of Form 10-K which is an annual report that provides a comprehensive analysis of the company’s financial condition. The Form 10-K is comprised of several parts. SEC 10-K forms can be very large, ranging from 100 to over 200 pages. This makes it difficult to search a specific information within the form.

Now to process this kind of document, we can build our solution using Custom Splitter and Custom Extractor as below:-

Here, we will first use a Custom Splitter, as this document is of 100 to 200 pages. Hence, it becomes very difficult to search & extract a specific information from it. Therefore, we will first use a Customer Splitter to split this document into relevant sections. For this usecase, a Custom Splitter is created with following labels — title page, table of contents, balance sheet, statement of operations or income, signature etc. Then we will use a Customer Extractor to extract the following information — company name, current asset, current liabilities, net income, shareholder equity, operating expenses etc. This way we will be able to extract specific information from lengthy and complex document in a simple and efficient way.

Evaluation: Go to “Evaluate & test” section, select the version that you just trained, and then select “View full evaluation”. You can now see the metrics such as f1, precision and recall for the entire document and each field as well. The evaluation engine can do both exact match or fuzzy matching. For an exact match, the extracted value must exactly match the ground truth or is counted as a miss. In Fuzzy matching, extractions that had slight differences such as capitalization differences are still counted as a match. This can be changed at the Evaluation screen by toggling the “Fuzzy Matching” toggle button.

Benefits of using Custom Classifier/Splitter:

  1. Accuracy: Custom classifier/splitter is trained on your data, so they are more likely to accurately process your documents than the general-purpose parser.
  2. Flexibility: Custom classifier/splitter can be trained to classify documents based on a wide variety of criteria, such as document type, content, or purpose. This gives you more control and flexibility over how your documents are being processed.
  3. Efficiency: Custom classifier/splitter can help you to automate your document classification process, which can free up your employees to focus on other tasks.
  4. Compliance: Custom classifiers can help you to comply with regulations by ensuring that your documents are classified correctly.

And that’s it!

Thank you so much for reading and please follow me for more such amazing blogs on Google Cloud services!

Keep Learning, Keep Growing!!!

--

--