GCP Document AI and Node-RED

Neil Kolban
Google Cloud - Community
4 min readFeb 20, 2021

Google’s Document AI service allows you to process documents and parse out their content into structured and machine readable data. When we think about documents here, think about scanned documents as opposed to Google or Word documents which already have structured content. Instead, think about documents that may have hand-written or typed values entered into them. Examples of documents in this class may include:

  • W2 earnings statements
  • 1099 income declarations
  • Your driver’s license or passport
  • The current medications form you fill in when visiting a new doctor

The Document AI product/service is described in detail here. The service is intended to be utilized within your own applications by calling the exposed and documented APIs. These APIs are available through REST calls and through a variety of programming language client libraries.

Completely separate from Document AI is the open source project called Node-RED that provides a very low-code visual drawing assembly environment to wire together composable building blocks of discrete functions into solutions. Node-RED has been around for many years and continues to grow in popularity. Within a Node-RED environment, one can visually plug together components that are invoked when an incoming request arrives. Node-RED is highly extensible meaning that as new technologies become available, they can be integrated as additional new building blocks for use within Node-RED. These new building blocks (called Nodes) are available for inclusion from a repository that is searchable from within the Node-RED environment. There is a package of Nodes that are dedicated to GCP integration. Included in this package are support for:

  • Pub/Sub
  • BigQuery
  • Cloud Storage
  • Logging
  • Firestore
  • IoT
  • others …

With GCP’s Document AI as a new service, a new Node which supports Document AI has been added to the list.

The Node takes the document data as input and then invokes the GCP Document AI service. The result from this call is the parsed data which is then immediately available for processing by downstream nodes. The following shows a simplified diagram where we receive a message from Pub/Sub (containing an original document), pass that to Document AI and then insert the parsed results into a BigQuery table.

Now let us look at more details of the new Document AI node. I am not going to explain Node-RED or general GCP concepts assuming that you can study those elsewhere. Reference links are included in the references at the end.

The input to the Document AI node expects data to be found in the incoming msg.payload. Since we are passing in binary data, the payload should be a base64 encoded string. This is typically what is passed via a REST request that is used to kick-off a Node-RED flow … however, Node-RED has capabilities to convert binary data (such as that read from a GCS object) into its equivalent base64 form. Currently, Document AI supports PDF, GIF and TIFF data formats. We need to tell Document AI the format of the data. We can do this in two ways. The first is to set the msg.mimeType field to the Mime type of the data. This would be one of:

  • application/pdf
  • image/gif
  • image/tiff

Alternatively, we can specify a fixed data type in the configuration properties for the node.

The msg.mimeType has precedence over the configured mime type.

When Document AI is invoked, we must first have configured a Document AI processor. When we do that, we are supplied a Processor ID value. The triple of Project ID, Processor ID and location where we want the Document AI to run must be supplied in the corresponding parameters.

After Invoking the Document AI node, the result will be found in the new msg.payload output from the node. This is an object that corresponds to the Document AI data structure described here. Each of the fields returned by Document AI can be processed directly by downstream Nodes in the Node-RED flow.

When a Document AI processor is defined, we can explicitly declare that it is a Form Processor type. Document AI will then explicitly parse out the form fields that it finds into name/value pairs. This is extremely clever of the service but consumption of the data is not the easiest. What we get back is a list of data structures that give us the start/end indices into the raw text of the parsed document corresponding to each name and value for a field. Work is normally still required to use those values. A selectable option in the Document AI node is called “Extract Form Fields”. When this is selected, any form fields in the parsed document are processed and a new array of name/value pairs is added at msg.payload.formFields. Think of this as an assistance feature.

A video illustrating the use of the Document AI node within Node-RED follows:

References

--

--

Neil Kolban
Google Cloud - Community

IT specialist with 30+ years industry experience. I am also a Google Customer Engineer assisting users to get the most out of Google Cloud Platform.