Processing and Extracting Form Data with Box Skills & Google Document Understanding AI

Box Developers
Box Developer Blog
Published in
5 min readApr 10, 2019
Extracting rich metadata insights from an invoice

As part of our ongoing journey into exploring machine learning systems, and how they may be applied in the Box Skills process to improve intelligent data processing, we’ve spent quite a bit of time looking into methods for extracting and processing form data. This data may include anything, such as invoices, contracts, policies, etc.

That’s why we’re excited to be partnering with Google on a brand new service launch, Google Document Understanding AI. We’ve had a chance to play around with the service and explore how to apply it in a real world use case.

Let’s start this dive into the technology by looking at the issues, workflow services, and process considerations that we can use to drive direct insights and knowledge services for the data using Document Understanding AI.

The Issue & Machine Learning Solution

At a very basic level the simplest form of the problem that we’re trying to solve is to extract and categorize data in a uniform way from a series of files. At a more complex level, once we have some scale of documents with the same type of data, we want to build intelligent querying abilities geared around those documents.

That’s where the Vision API Document Parsing Alpha within Google’s Document Understanding AI comes in. In essence, the API will allow you to create mini knowledge bases out of particular business functions and outcomes.

Let’s look at one of the potential ways this can help elevate an industry that we work with often at Box, mortgage processing. Typically a mortgage loan process will have over 50 steps that need to be processed before the loan is completed, involving underwriting, legal, etc. Flagging the completion of these steps is usually a very manual and time consuming process. Using a system like Google Document Understanding AI we could automate the validation of these documents / steps, then store that validation information back into the underlying stored file metadata on Box, allowing for extensive automated workflows to be built on top of the data insights.

Taking a different approach, let’s discuss the possibilities of automating sensitive information detection for highly regulated industries. When working with large swaths of unstructured, sensitive data, one of the most important tasks is to ensure that this data is protected, and the way to start that is to make sure we know that it’s there to begin with. Documents that are pushed within Box can use the Google Document Understanding AI to identify and categorize sensitive information located within documents. Once identified, these documents can be flagged appropriately to ensure they are stored in a compliant fashion and sharing is locked down.

All in all, having a multifaceted knowledge tools trained for specific business goals can help unify and standardize systems while creating rich data insights and transparent processes.

Box Skills as the Engine for Knowledge Population

Now that we understand the problem that we’re trying to solve and the machine learning technology that will get us there, let’s turn our focus on the process of connecting the technology and creating the persistent storage of this categorized information.

Let’s start simply with connecting the technology together and creating a knowledge store. Using the Box Skills process we are able to solve our simple workflow: A file is uploaded, the process connects to the Vision API Document Parsing Alpha for data insights, and that data is stored back into persistent categorized storage within the originating Box file.

As more files of the same type are processed, the knowledge base is built up, and thus becomes more effective. Box metadata may be used as the knowledge base storage for machine learning system, or that data may be unified with other systems to allow that smart querying of data.

Exploring a Solution Avenue

Keeping these concepts in mind, let’s talk about a practical integration of this tech stack. We wanted to explore a simple concept, whether we could extract fields from invoices in a consistent fashion. Using the resulting metadata we would have the power to start creating more efficient invoice processing (through the knowledge base), thus reducing cycle times to curate this information, decreasing risk from errors, driving faster collections for more cash flow, or enabling quicker payments to push better discount terms.

Example of the Box Skills process, end to end.

Let’s walk through the technology stack and process around how these technologies can be merged together to create this workflow on a Google centric stack:

  • Box Skills are enabled for certain folders where the documents will be uploaded. A new file is uploaded to that location, triggering off the event notification from Box Skills to the location specified by the invocation URL in the Skills app.
  • The listener (invocation URL location) is a serverless function residing on a Google Cloud Function. That function extracts the file stream from Box and pushes it to the Vision API Document Parsing Alpha.
  • The Vision API Document Parsing Alpha extracts data insights from the uploaded file. Once ready, the results are sent back to the serverless function.
  • The function jointly pushes the new data insights back into the Box file as metadata (for visualization) and a Google Cloud Storage bucket, to be used to help further build the knowledge base for smarter data insights.

Smarter Business Insights

As we’ve seen through this example, the coupling of different technology stacks can generate extensive data insights for your business processes.

At a base level we can see how a machine learning technology can be applied to a file to pull intelligent metadata, but that’s just the tip of the iceberg. The real power behind these technologies comes when the machine learning model becomes smarter (through repeated use), which in turn allows you to enhance its capabilities to create things like deep querying of all data, geared around the exact business process that you’re trying to solve.

Google Document Understanding AI is a prime example of the powerful knowledge services that can be created from simple data sets, moving your business from manual, tedious processing through to the future of smart data insights and design.

--

--