Getting Started with Document AI: Introduction, Processors & Evaluation Metrics

Published in

Google Cloud - Community

7 min readMar 11, 2024

Document AI turns unstructured content into structured data making it easier to understand, analyze, and consume. It extracts & classifies information from unstructured documents.

Its an end-to-end, cloud-based platform for Document Processing.

Along with reading and ingesting your documents, it also understands the spatial structure of the document. For example: if you run a Customer Feedback Form(Q&A type) through a parser, Document AI understands that there are questions and answers in customer feedback form, you’ll get those back as key-value pairs. Now as this data is structured and is available in key-value pairs, it becomes more useful for you. For ex: you can run some quick analytics through this and understand the customer sentiment from the feedback. You can easily incorporate the output into your applications by calling an API.

Document AI Processors

A Document AI Processor is an interface between the document file and a Machine Learning model designed for a document-focused task.

You will have to create unique instances of Document Processors in each project.

Document AI Processor Functions:

OCR: Document OCR can be used to identify & extract text in different types of documents.
Form Parsing: Form Parser can be used to extract form elements such as text and checkboxes.
Quality Analysis: Document Quality Processor can be used for intelligent document quality processing.
Splitting: Document Splitter can be used to identify document boundaries to split in a large file.
Classification: For ex. Lending Doc Splitter/Classifier can be used to identify documents in a large file and classify known lending doc types.
Entity Extraction: For ex. Invoice Parser can be used to extract 30+ fields from Invoices: Id, Amount, lineitem etc.

Document AI Processor Categories:

General Processors(Your Content, DocAI’s General Models): These are designed to work with any document. For ex: Document OCR or Form Parser.
Specialised Processors(Your Content, DocAI’s Special Models): For Specialised Processors, Doc AI has pre-built models for common types of business documents. For ex: Invoice Parser or Driver Licenses.
Customer Processors(Your Content, Your Model, Your Customizations): Customer Processors offers the capability to build models for your own document types. You can train custom models from scratch or up-train existing models without having to write any machine learning code.

Document AI in Action

Follow the below steps to get started with Document AI:

Step 1: In the GCP console, navigate to “Document AI” from the left side pinned products or search for “Document AI” in the search bar.

Step 2: Now if the “Document AI” service is not enabled, then enable it first.

Step 3: Click on “Explore Processors”

Step 4: Select the processor type you want to create. For ex: Document OCR.

Step 5: Now give your processor a name and specify a region for the same and click on “CREATE”.

Step 6: Please note the Processor Id from this screen as you will need the same while accessing the APIs.

Step 7: Now upload a test document file by clicking on “UPLOAD TEST DOCUMENT” and view the output.

Evaluate Processor Performance

Document AI generates evaluation metrics, such as precision and recall, to help you determine the predictive performance of your processors.
These evaluation metrics are generated by comparing the entities returned by the processor (the predictions) against the annotations in the test documents.
If your processor does not have a test set, then you must first create a dataset and label the test documents.
An evaluation is automatically run whenever you train or uptrain a processor version.
You can also manually run an evaluation. This is required to generate updated metrics after you’ve modified the test set, or if you are evaluating a pretrained processor version.
An important point to note here is that, Document AI cannot and does not calculate evaluation metrics for a label if the processor version cannot extract that label (for example, the label was disabled at the time of training) or if the test set does not include annotations for that label. Such labels are not included in aggregated metrics.

Evaluation Process

Step 1: If your processor does not have a test set, then you must first create a dataset and label the test documents.

Step 2: Collect extracted entities from the processor output.

Step 3: Compare extracted entities from the processor output with the test set. A predicted entity matches an annotation if:

the type of the predicted entity (entity.type) matches the annotation’s label name
the value of the predicted entity (entity.mention_text or entity.normalized_value.text) matches the annotation’s text value, subject to fuzzy_matching if it is enabled.

Note that the type and text value are the only ones used for matching. Other information, such as text anchors and bounding boxes are not used.

Evaluation metrics

For understanding the Evaluation Metrics i.e Precision, Recall & F1 score let’s first understand the terms that are required to arrive at these metrics.

True Positives: The predicted entities that match an annotation in the test document.
False Positives: The predicted entities that do not match any annotation in the test document.
False Negatives: The annotations in the test document that do not match any of the predicted entities.

Now let’s understand what are Precision, Recall and F1 Score.

Precision: The proportion of predictions that matches the annotations in the test set.

Percision = True Positives / (True Positives + False Positives)

Recall: The proportion of annotations in the test set that are correctly predicted.

Recall = True Positives / (True Positives + False Negatives)

F1 score: The harmonic mean of precision and recall, which combines precision and recall into a single metric, providing equal weight to both.

F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

Evaluation Metrics for Single Labels

Evaluation Metrics for All Labels

Metrics for All labels are computed based on the number of true positives, false positives, and false negatives in the dataset across all labels, and thus, are weighted by the number of times each label appears in the dataset.

Please note a very important point here i.e, Document AI does not provide a metric for Accuracy. The accuracy metric, often defined as the proportion of instances that are predicted correctly, is less meaningful because:

Not all labels appear in the test set (for example, optional fields)
And there may be multiple values for a single label (for example, line items in an invoice).

F1, on the other hand, can be considered roughly equivalent to accuracy as it more meaningfully accommodates these scenarios.

Confidence Threshold

Document AI automatically computes the optimal threshold, which maximizes the F1 score, and by default, sets the confidence threshold to this optimal value.

You are free to choose your own confidence threshold by moving the slider bar. In general, higher confidence threshold results in:

higher precision, because the predictions are more likely to be correct.
lower recall, because there are fewer predictions.

Single-occurrence Vs. Multi-occurrence labels

Single-occurrence labels have one value per document (for example, invoice ID) even if that value is annotated multiple times in the same document (for example, the invoice ID appears in every page of the same document). Even if the multiple annotations have different text, they are considered equal. In other words, if a predicted entity matches any of the annotations, it is counted as a match.

Multi-occurrence labels can have multiple, different values. Thus, each predicted entity and annotation is considered and matched separately. If a document contains N annotations for a multi-occurrence label, then there can be N matches with the predicted entities.

Fuzzy Matching

The Fuzzy Matching toggle lets you tighten or relax some of the matching rules to decrease or increase the number of matches.

For example, without fuzzy matching, the string “ABC” will not match “abc” due to capitalization. But with fuzzy matching, they will match.

When fuzzy matching is enabled, here are the rule changes:

Whitespace normalization: removes leading/trailing whitespace and condensing consecutive intermediate whitespaces (including newlines) into single spaces.
Leading/trailing punctuation removal: removing the following leading/trailing punctuation characters !,.:;-”?|.
Case-insensitive matching: converting all characters to lowercase.
Money normalization: For labels with the data type money, removing the leading/trailing currency symbols.

Keep Learning, Keep Growing!!!