Translate Documents using Azure AI Translator Service -Part 1

Venkatesh Maddukuri
4 min readJun 13, 2024

--

Use Case: If we wanted to do documents translation using Azure service (AI Translator)

Pre Requisites:

  1. Create translator resource in azure portal from below Menu.

2. Enabled System assigned Managed Identity as below for translator resource.

Managed identities for Azure resources are service principals that create a Microsoft Entra identity and specific permissions for Azure managed resources. Managed identities are a safer way to grant access to storage data and replace the requirement for you to include shared access signature tokens (SAS) .

To understand more about Managed Identities, Please refer below article.

Reason behind enabling System assigned identity in this scenario is , we would needs to access data from blob storage so instead of using sas url or account key or Service Principle , we would use Managed Identity for authentication purpose as below.

3. We would need One blob storage which holds documents for processing/translation and Databricks with cluster to run code.

4. One Service Principle to authenticate Translator resource from Databricks.

Please note that, in Step 2, we enabled System Managed Identity which is used to integrate/connect from Translator Resource to Blob storage.

however our code executes in Databricks so we need to authenticate/connect translate resource from databricks so for this connection , we use Service Principle and this Service Principle should have “Cognitive Services User” Role on Translator resource as below.

5. As mentioned in Step 2, Managed Identity should have “Storage Blob Contributor” Role on Blob storage which hold documents for translation requirement.

6. Below Libraries needs to install for interacting with blob and translator resource from Databricks.

%pip install azure-identity azure-ai-translation-document azure-storage-blob 

All above 6 steps are Pre-Requisites if we wanted to integrate databricks service with AI translator for translating documents.

Please refer below code for document translation using Azure AI translator Service.

Step 1: Import all required libraries

from azure.identity import DefaultAzureCredential
from azure.ai.translation.document import DocumentTranslationClient
from azure.identity import ClientSecretCredential
from azure.storage.blob import BlobServiceClient
import urllib.parse

Step 2:

Setting up blob storage details where we have documents for translation and target language.

# Azure Blopt_name = "<blob_storage>"
src_container_name = "src"
target_container_name="target"
blob_service_url = f"https://{storage_account_name}.blob.core.windows.net"

trans_target_lang="te"

Step 3:

Taking service principle client id, client secret, tenant id for authentication

and doc-endpoint is the one, which we take from translator resource

Navigation as below to get this.

Resource Management → Keys and Endpoint → Virtual Network → Document translation → Copy Value from here

# Set your Azure AD tenant ID, client ID, and client secret
###### get secrets from keyvault using scope
tenant_id = dbutils.secrets.get(scope="<Scope_Name>",key="tenant-id")
client_id = dbutils.secrets.get(scope="<Scope_Name>",key="client-id")
client_secret = dbutils.secrets.get(scope="<Scope_Name>",key="client-secret")
doc_trans_endpoint=dbutils.secrets.get(scope="<Scope_Name>",key="doc-endpoint")

credential = DefaultAzureCredential()

Here client-id and client-secret are Service Principle Client ID and Its secret Value.
tenant-id means your subscription tenant Id, you can refer below page to get this value.

https://learn.microsoft.com/en-us/entra/fundamentals/how-to-find-tenant

doc-endpoint means Azure AI translator resource End point, Please follow below Navigation from your translator resource to get it.

Resource Management → Keys and Endpoint → Virtual Network

Step 4:

Passing Service Principle Credential for authenticating blob storage to get the documents which are stored in source container.

# Initialize service principal credential
credential = ClientSecretCredential(tenant_id, client_id, client_secret)

# Authenticate with BlobServiceClient using service principal credential
blob_service_client = BlobServiceClient(account_url=blob_service_url, credential=credential)

# Get a client to interact with the specified container
container_client = blob_service_client.get_container_client(src_container_name)

# List blobs in the container
blob_list = container_client.list_blobs()
print(f"blob_list is {blob_list} ")

for blob in blob_list:
print(f"file names {blob.name}")

Step 5:

Now call the “document_translation_client” by passing required files and target language to do translation as below.

document_translation_client = DocumentTranslationClient(doc_trans_endpoint, credential)
poller = document_translation_client.begin_translation(f"{blob_service_url}/{src_container_name}",
f"{blob_service_url}/{target_container_name}",
f"{trans_target_lang}")

result= poller.result()
print(f"result is {result}")

Once we run above code, It will translate all files which are source container and write back to target container post translation with mentioned language.

Hope you enjoyed reading this article, Thank you .

--

--