Lets Sail Together with.…..AI : Azure AI Document Intelligence Part I

Chaskarshailesh
Javarevisited
Published in
7 min readJun 13, 2024

Azure AI Document Intelligence uses Azure AI Services to analyze the content of scanned forms and convert them into data. It can recognize text values in both common forms and forms that are unique to your business.

Azure AI Document Intelligence is an AI solution that can replace manual data entry with automatic analysis of data in printed and hand-written forms and documents. You can use this tool to extract information from forms such as key-value pairs, text, selection marks, and more.

Azure AI Document Intelligence is an Azure service that you can use to analyze forms completed by your customers, partners, employers, or others and extract the data that they contain.

Azure AI Document Intelligence is easy to use but, to create a reliable solution, you must understand its objects such as models, APIs, and tools.

As an Azure AI Service, Azure AI Document Intelligence is a high-level AI service that enables developers to access data in forms quickly. It’s built on the lower level Azure AI Services, including Azure AI Vision.

Use a model to inform Azure AI Document Intelligence about the type of data you expect to be in the documents you’re analyzing. If your forms have a common structure or layout, you can increase the accuracy of the results and control the structure of the output data by using the most appropriate model.

Azure AI Document Intelligence outputs data in JSON format, which is widely compatible with many databases, other storage locations, and programming languages.

Azure AI Document Intelligence includes several prebuilt models for common types of forms and documents. If your forms are of one of these types, you can extract information from them without training your own custom models. It’s very quick to create and deploy an Azure AI Document Intelligence solution when you use prebuilt models.

Three of the prebuilt models are designed to handle general documents and extract words, lines, structure and other information such as the language the document is written in:

  • Read : Use this model to extract words and lines from both printed and hand-written documents. It also detects the language used in the document.
  • General document : Use this model to extract key-value pairs and tables in your documents.
  • Layout : Use this model to extract text, tables, and structure information from forms. It can also recognize selection marks such as check boxes and radio buttons.

The other prebuilt models are each designed to handle, and trained on, a specific and commonly used type of document. Some examples include:

  • Invoice. Use this model to extract key information from sales invoices in English and Spanish.
  • Receipt. Use this model to extract data from printed and handwritten receipts.
  • W-2. Use this model to extract data from United States government’s W-2 tax declaration form.
  • ID document. Use this model to extract data from United States driver’s licenses and international passports.
  • Business card. Use this model to extract names and contact details from business cards.

If you have an unusual or unique type of form, you can use the above general document analysis prebuilt models to extract information from them. However, if you want to extract more specific information than the prebuilt models support, you can create a custom model and train it by using examples of completed forms.

You can also associate multiple custom models, trained on different types of document, into a single model, known as a composed model. With a composed model, users can submit forms of different types to a single service, which identifies them and selects the most appropriate custom model to use in their analysis.

  1. If you want to extract simple words and text from a picture of a form or document, without contextual information, Azure AI Vision OCR is an appropriate service to consider. You might want to use this service if you already have your own analysis code, for example.
  2. However, Azure AI Document Intelligence includes a more sophisticated analysis of documents. For example, it can identify key/value pairs, tables, and context-specific fields. If you want to deploy a complete document analysis solution that enables users to both extract and understand text, consider Azure AI Document Intelligence.

If you want to try many features of Azure AI Document Intelligence without writing any code, you can use Azure AI Document Intelligence Studio. This provides a visual tool for exploring and understanding the capabilities of Azure AI Document Intelligence and its support for your forms.

For example, you can use Azure AI Document Intelligence Studio to try analyzing your sales invoices and to explore the data produced by the Invoice prebuilt model. Then you could decide whether the prebuilt model extracts the values you need or whether to create your own custom model for a more unusual type of invoice.

Azure AI Document Intelligence includes Application Programming Interfaces (APIs) for each of the model types you’ve seen. The following languages are supported:

  • C#/.NET
  • Java
  • Python
  • JavaScript

If you prefer to use another language, you can call Azure AI Document Intelligence by using its RESTful web service.

Lets explore using prebuilt Document Intelligence models : —

Step 1 : Create an Azure AI Document Intelligence resource

Before you can call the Azure AI Document Intelligence service, you must create a resource to host that service in Azure:

  1. In a browser tab, open the Azure portal at https://portal.azure.com, signing in with the Microsoft account associated with your Azure subscription.
  2. On the Azure portal home page, navigate to the top search box and type Document Intelligence and then press Enter.
  3. On the Document Intelligence page, select Create.
  4. On the Create Document Intelligence page, use the following to configure your resource:
  • Subscription: Your Azure subscription.
  • Resource group: Select or create a resource group with a unique name such as Azure-AI-Challenge.
  • Region: select a region near you.
  • Name: Enter a globally unique name — azureaidocintligence1.
  • Pricing tier: select Free F0 (if you don’t have a Free tier available, select Standard S0).

5. Then select Review + create, and Create. Wait while Azure creates the Azure AI Document Intelligence resource.

6. When the deployment is complete, select Go to resource.

Step 2 : Use the Read model

Let’s start by using the Azure AI Document Intelligence Studio and the Read model to analyze a document with multiple languages. You’ll connect Azure AI Document Intelligence Studio to the resource you just created to perform the analysis:

  1. Open a new browser tab and go to the Azure AI Document Intelligence Studio at https://documentintelligence.ai.azure.com/studio.
  2. Under Document Analysis, select the Read tile.
  3. If you are asked to sign into your account, use your Azure credentials.
  4. If you are asked which Azure AI Document Intelligence resource to use, select the subscription and resource name you used when you created the Azure AI Document Intelligence resource.

5. In the list of documents on the left, select read-german.pdf.

6. At the top-left, select Analyze options, then enable the Language check-box (under Optional detection) in the Analyze options pane and click on Save.

7. At the top-left, select Run Analysis.

8. When the analysis is complete, the text extracted from the image is shown on the right in the Content tab. Review this text and compare it to the text in the original image for accuracy.

9. Select the Result tab. This tab displays the extracted JSON code.

10. Scroll to the bottom of the JSON code in the Result tab. Notice that the read model has detected the language of each span. Most spans are in German (language code de) but you can find other language codes in the spans (e.g. English - language code en - in one of the last span).

Lets be connected and lets sail together…..with AI!!

--

--

Chaskarshailesh
Javarevisited

I am a Site Reliability Engineer aspirant Cloud Solutions Architect. Further exploring the horizon into MLOps