Azure Cognitive Service — Language Service

Reshma Ladakhan
Brillio Data Science
4 min readApr 1, 2022

Co-Author Reshma Ladakhan and Basavaraj Jakkannavar

When you are looking to build an AI-ML intelligent application quickly without having direct AI or data science skills, then you can investigate the option to make use of Azure’s Cognitive services. It provides APIs and services to help developers to quickly and easily build the efficient model so that you can add cognitive features into your application. Also provides capability to add your custom models based on business requirement.

Different APIs that cognitive service provides are Decision, Language, Speech, and Vision. Text Analytics API is a language API that provides NLP over text. This article is to understand how to integrate Language API (Language Service) with the business use-case.

It is certainly difficult when you want to search PDFs or any other documents when it contains large text or images from a large corpus of documents. And it is difficult to read the information present on the images especially when the text on the images is tilted to a certain angle. To overcome these kinds of problems for this we have built an application/solution for our Pharma customers. Below Is our use-case.

Document digitization and search capability

To Build an intelligent model using cognitive service Language Service API to search the content from the document. Also, to classify these documents based on the content within the corpus of documents and provide insights. These insights should ultimately allow users across the whole organization easily and automatically use this data for the area of research.

Process /steps involved

1. Upload documents/files in the azure blob from where you want to import the data. Use built-in skills/ algorithms of Language Service (NER) to extract key phrases, organization, people, and location entities from the documents uploaded once the request is submitted.

2. It is difficult to read the information from the scanned images present in the documents especially when the text information is tilted to a certain angle, at that point of time use OCR API to extract the information present on the images.

3. To classify the documents based on the content present in the documents, build a classifier and add that model to the pipeline. You can search for the content through Azure search pipeline.

4. Enable synonym mapping, which means attaching a dictionary to the pipeline which has synonyms to the specific word that you are searching for.

5. Run the synonym mapping scrips from Azure DevOps

Azure Text Analytics for health

Clinical documents contain a lot of information about the candidate’s health status, medications, past medical history, family members' health information etc. Extracting, Analyzing, and Interpreting this information is difficult and time-consuming.

Azure language service (Text Analytics for health) extracts and labels relevant medical information from unstructured texts such as doctor’s notes, discharge summaries, clinical documents, and electronic health records.

Language service returns a complete analysis of text in a single API call, which makes it cost-effective.

Final Observations /Conclusion

Benefits

· Azure search service — provides an easy option to add custom skills based on business requirements to the existing pipeline.

· Azure cognitive service — provide APIs to extract several named entities and data from images present in the documents.

· Azure App Service — provides an option to deploy custom codes from Visual studio code and to connect to the Azure search pipeline.

· Synonym Mapping — enables to addition of different synonyms to Azure search.

· Azure DevOps — Helps to deploy and run the code of Synonym mapping using the pipeline.

Caveats: listing down some of the common errors that the indexer will generate while running

(Since most of the errors might be due to the built-in skills that Azure cognitive service is providing so, most of the issues you can mitigate by changing the service tire)

1. Document is ‘195905788’ bytes, which exceeds the maximum size ‘134217728’ byte for document extraction for your current service tier.

Reason — The maximum size for processing allowed is 134MB (depends on the service tire). you might resolve this by changing your current service tire.

2. Skill may fail to get executed because Web API request fails.

Reason — The problem is due to custom skills. You need to debug your custom code to resolve the issue.

3. Skills will fail to execute when skill input ‘text’ is too large. The text had to be truncated to ‘50000’ characters to enable execution.

Reason — Cognitive skills have limits to the length of text that can be analyzed at once. If the text input of these skills is over that limit, it truncates the text to meet the limit, and then performs the enrichment on that truncated text. This means that the skill is executed, but not over all your data.

4. Extraction of content or metadata from the document may fail sometimes.

Reason — Indexer won't able to project the document — may be due to Blob is over the size limit (max allowed is 134MB), a blob is encrypted, blob has an unsupported content type or unexpected connectivity issue

--

--