Image used from https://developers.google.com/learn/topics/document-ai

Entity Extraction made Easy with Google’s Document AI!!!

Daisy
Google Cloud - Community
4 min readAug 21, 2022

--

Entity extraction is a prominent use case that every industry has especially seen in finance and banking, where automation of customer address and identity verification can be advantageous. Entity extraction is commonly used in industry use cases that involve document processing, document analysis, document verification, and validation. I remember working on projects for extracting entities from driving licenses, passports, forms, etc., using open source OCR engines like easyocr, paddle ocr, and tesseract. Though this OCR solutions work very well for extracting text from documents but requires extensive image preprocessing for input data and postprocessing on the extracted text to extract the required entities from these documents. I have also worked on use cases for entity extraction using the name entity recognition model of Bert and CRFsuite. Though all these models have shown proven results in the past, using these models requires a developer to have knowledge of deep learning, computer vision, and developing post-processing logic which requires experienced machine learning engineers or data scientists.

Google has made this easy by introducing Document AI in its Google Cloud Platform (GCP). Document AI is a product of google that uses Natural Language and Computer Vision (OCR) technology to create pre-trained models for processing high-value and high-volume documents. Document AI API can be used to build customized customer solutions for faster decisions. Follow https://cloud.google.com/document-ai for more information on Document AI. The API has a range of processors to use. We can select a processor as needed for our use case.

Document AI parsers Available

  1. Document OCR: This is the general processor available which can be used for any document to extract text. This processor allows us to identify and extract text from documents in over 200 languages for printed text and 50 languages for handwritten text.
  2. Form parser: This parser can be used when we need to extract form elements i.e. information present in a form document. The parser returns all the key value information present in the document.
  3. Invoice parser: Extract text and values from invoices such as invoice number, supplier name, invoice amount, tax amount, invoice date, and due date. The invoice Parser extracts header and line item fields, such as invoice number, supplier name, invoice amount, tax amount, invoice date, due date, and line item amounts.
  4. Payslip parser: This parser is used to process payslip document information.
  5. Driving license parser: To extract field values from driving licenses. Two driving license parsers are currently available: the US Driver License parser and the France Driver License parser.
  6. Passport parser: The passport parser extracts entities from a passport document. There are two passport parsers currently available: the US passport parser and the France passport parser.
  7. National ID parser: This parser can be used to extract National ID card entities. Presently France National ID parser is available in Document AI.
  8. Utility parser: Extract text and values from utility bills such as supplier name and previously paid amount.
Figure 1: Example output of utility parser

Google continuously updates its parsers, and in the future, more parsers are expected to be available. A list of all available parsers can be found at https://cloud.google.com/document-ai/docs/processors-list. All these parsers have been trained on tens of billions of pages of documents across lending, insurance, government, and other industries. I have used most of the parsers and found the extraction accuracy very high for my use case.

How to use Document AI

Using Document AI API in Google cloud is super easy

  1. Login to your GCP account
  2. In the search bar, search for Document AI
  3. Go to create processor.
  4. Choose a processor suitable for your use case
  5. Document AI creates a prediction endpoint where you can send your documents.
  6. Call this prediction end point from your python code to get the prediction.
  7. The output of the prediction is a JSON format.
  8. Once you get the JSON response, read the JSON and extract the entities needed.

You can create the processor using the Document AI GUI or Google-provided python client libraries. Follow https://cloud.google.com/document-ai/docs/create-processor to create a processor using the client library.

Document AI supports both synchronous and asynchronous API calls. To process a single document, use a synchronous API call, and to process multiple documents, use an asynchronous API call. Follow https://cloud.google.com/document-ai/docs/send-request to send a prediction request.

Document AI Limits and Quotas

The most important thing we must remember while using Document AI API is its limits and Quotas. We make API calls to use any GCP service or components, So having a request limit for any API is very typical. Google has provided documentation for the limits and quotas for all its GCP components which are very useful while designing solutions and essential to keep in mind before using any service. So a developer must go through the documentation of the limit and quotas before using the service. The documentation for Document AI limit and quotas can be found at https://cloud.google.com/document-ai/quotas.

Start Using Keep Developing!!!

--

--