Get Text from Images with Azure Cognitive Services

Michael Hannecke

Published in

Bluetuple.ai

13 min readAug 26, 2023

Text Extraction from Images with Azure Cognitive Services — Image made by the author by arguing a bit with dall-e

Building a simple AI project: Extracting text from images using Azure Cognitive Services and Terraform.

Introduction

In this article, I will walk you through a small AI project that showcases the power of AI-driven image processing. We’ll leverage Microsoft Azure Cognitive Services, a collection of AI services and APIs, to extract text from images. To streamline the deployment of these services, I’ll employ Terraform, an infrastructure-as-code tool. By the end of this tutorial, you’ll have a better understanding of how to integrate AI capabilities into your projects and automate the provisioning of resources.

This marks the beginning of a series of articles centered around a small ML project, culminating in the creation of a functional web application. This application aims to offer additional features to visitors of a wine enthusiast website. For instance, it will enable users to identify and classify wines based on photos of the bottles. Furthermore, the application will provide additional information about the wines.

At this juncture, I’d like to extend my gratitude to my friend and TYPO3 specialist, Christoph Schweres. He has not only provided the images for this project, but also a wealth of information about wine.

I have obtained formal permission from the site owner personally to utilize the images. He is leveraging his expertise in both wines and the CMS TYPO3, which underscores our longstanding business association spanning several years.

All the images used are sourced from the website:

https://www.wein-fuer-jedermann.de/

If you’re eager to expand your knowledge about wines, I wholeheartedly recommend paying the site a visit.

0. Baseline

In this project, I will utilize Azure Cognitive Services, which provides various AI functionalities, including image analysis, speech recognition, and language understanding. The focus will be on the Computer Vision service, which enables us to extract text from images. To achieve this, we will follow these steps:

Deploying the infrastructure with terraform.
Loading sample images to the provisioned Azure Cognitive Endpoint.
Receiving json formatted response about the information extracted from the images.
Destroying the infrastructure via terraform once done with the sample project.

All source code and some sample images can be found here:

GitHub - bluetuple/ml-azure: Machine Learning scripts on Azure

Machine Learning scripts on Azure. Contribute to bluetuple/ml-azure development by creating an account on GitHub.

github.com

One remark: Using the Cognitive Service endpoint incur costs , depending on how many pictures you’ll send to the endpoint for text extraction.— please check azure pricing calculator upfront to be clear on the costs!

1. Requirements

When you want to follow along, you will need

An Azure subscription you have at least “Contributor” rights to
Azure CLI installed on your device
Python in Version 3.10 or higherinstalled
Terraform in an actual version
An IDE like VSCode

Setting the environment

for the start create a project folder to store the files we’ll create during this little project. Open a terminal and navigate to this folder and login to Azure:

az login

You’ll be asked for your credentials. A browser window will open and you have to login to azure. Notice in the terminal the list of available subscriptions. Copy the ID of the subscription you want to use.

Now set the subscription:

az account set --subscription=<your-subscription-id>

Now we have to create a Service Principal:

az ad sp create-for-rbac --role="Contributor" \
  --scopes="/subscriptions/<your-subscription-id>

It is recommended to NOT store the provided credentials in any Terraform script, so you should export them as environment variables and store them in a hidden file on the console.

Keep in mind, that everyone who has access to your local console could as well read these variables. So for a production environment this would not be a preferred approach. For any production environment I would recommend to store these variables in a key vault, but this is out of scope for this project.

Let’s store the variables locally in a hidden file. Make sure to not expose this file to any version control system you may use (… a .gitignore might be your friend ).

nano .secrets
echo "export ARM_CLIENTID=yxz" >> .secrets
echo "export ARM_CLIENT_SECRET=12345" >> .secrets
echo "export ARM_SUBSCRIPTION_ID=yxyz" >> .secrets
echo "export ARM_TENAND_ID=xyz" >> .secrets

Ensure to fill in your specific values. Especially the client secret you should not share with anyone.

Activate the settings in the terminal:

source .secrets

2. Terraform Configuration

As the prerequisites are set now, we have to create a couple of terraform declaration files next.

main.tf

# Configure the Azure provider and required providers
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~>3.70" # Define a compatible version of the AzureRM provider
    }

    azuread = {
      source  = "hashicorp/azuread"
      version = "~> 2.40" # Define a compatible version of the AzureAD provider
    }
  }
}

# Configure the AzureAD provider for managing Azure Active Directory resources
provider "azuread" {
  tenant_id = data.azurerm_client_config.current.tenant_id # Set the tenant ID from Azure client configuration
}

# Define the AzureRM provider configuration with a feature statement
# Provider block for the AzureRM provider
provider "azurerm" {
  # Features configuration block
  features {
    api_management {
      # Set soft_deletion behavior for resources like cognitive services
      # Some modules, like cognitive service, deprecate soft_deletion, so it's configured here
      purge_soft_delete_on_destroy = true  # Purge soft-deleted resources on destroy
      recover_soft_deleted         = false # Do not recover soft-deleted resources
    }
  }
}

This manifest will enable terraform to download the required provisioners. For the ease of this project I’ll go with a local state file. If you want to use a remote state, e.g. because you want to share the project with your teammates, I’ recommend another article diving into this a bit deeper:

Terraform Remote State on Azure

Step-by-step guide to setup Terraform remote state file on azure

medium.com

resourcegroups.tf

# Define a resource group for the Azure Cognitive Services

# Create a resource group to contain the Azure Cognitive Services resources
resource "azurerm_resource_group" "rgcognitive" {
  name     = var.cognitive_resgroup # Name of the resource group
  location = var.default_location   # Location where the resource group will be created

}

We will place all Azure resources within one resource group which we declared in the resourcegroup.tf file.

cognitive_service.tf

# Terraform script to manage Azure Cognitive Services

# Define a resource for creating an Azure Cognitive Services account
resource "azurerm_cognitive_account" "cognitive_service" {
  name                = var.cognitive_service_name # Name of the Cognitive Services account
  location            = var.default_location       # Location where the resource will be created
  resource_group_name = var.cognitive_resgroup     # Resource group where the account will be placed
  kind                = "CognitiveServices"        # Specify the kind of service as CognitiveServices
  sku_name            = "S0"                       # Specify the SKU for the service

  # Configure a custom domain name for the Cognitive Services account
  custom_subdomain_name = var.cognitive_service_domain_name

  # Configure network access control rules for the Cognitive Services account
  network_acls {
    default_action = "Deny" # Set the default action to deny traffic

    ip_rules = [var.own_ip_address] # Allow traffic only from the specified IP address
  }

  # Depend on the creation of the specified resource group
  depends_on = [azurerm_resource_group.rgcognitive]

  # Define timeouts for resource operations
  timeouts {
    delete = "5m" # Specify the timeout for resource deletion
  }

}

This file describes the main resource we need to carry out image text extraction. Main resource to be defined is “azurerm_cognitive_account”, which takes a couple of parameters I would like to explain a bit more in detail:

“kind” defines the service we want to utilize out of the available options
“sku” = S0 limit the service to the “Standard” service level; have a look into the Azure documentation for more details about the different SKUs Azure provides for this service.
“custom_subdomain_name” This will set the domain part preceding the standard cognitive services domain (<subdomain>.cognitiveservices.azure.com).
“network_acls” is optional. I’d recommend to only allow dedicated IP addresses to call the endpoint, thus this network acl denies access for all but our own IP.
“depends_on” ensures that the service will not be deployed by terraform before the resource group is available. It’s not mandatory to define this rule as terraform normally does a good job figuring out the correct order for deploying resources, but in the past I had issues with deploying cognitive services and the correct order, so I’m used to set the depends_on trigger.
“timeouts” — default for deleting timeout is 30 minutes, way to long to ait for the terraform destroy command to finalize in worst casee , so setting the default to 5 minutes seemed to be a good compromise, at lest for testing.

variables.tf

# variables.tf
# Default Region
variable "default_location" {
  type        = string
  description = "default location used for managemtn assets (mostly westeurope)"
}

# Name of resource group for cognitive services
variable "cognitive_resgroup" {
  type        = string
  description = "The resurcegroup for all cognitive services"
}


# Own IP adress
variable "own_ip_address" {
  type        = string
  description = "own external ip address"
}

# variable for generic cognitive service
variable "cognitive_service_name" {
  type        = string
  description = "name of generic cognitive service"
}
# Variable for domain name of cognitive service
variable "cognitive_service_domain_name" {
  type        = string
  description = "name of generic cognitive service custom domain"
}

The variables.tf declares all used variables, the comments in the source code should be sufficiently explanatory.

image-recognition.tfvars

# image-recognition.tfvars
# Within this file you can set individua values for the variables used in the terraform declarations

# Resource Group for cognitive serviced
cognitive_resgroup = "<resource-groupname>"

# Default location for development
default_location = "<region>"

# Own IP
own_ip_address = "<own-external-ip"



# cognitive service name (generic service)
cognitive_service_name        = "sbx_eu_cognitiveservice"
cognitive_service_domain_name = "sbx-cs"

Herein you have to set your specific values which then in turn terraform will insert in the variables during the “plan” and “apply” runs.

output.tf

# output.tf

# This data block retrieves existing Azure Cognitive Service account details.
data "azurerm_cognitive_account" "csa" {
  # Set the name and resource group of the cognitive service we want to retrieve details for.
  name                = var.cognitive_service_name
  resource_group_name = var.cognitive_resgroup
  
  # Ensure this data block depends on the creation of the actual cognitive service.
  # This ensures that the Terraform plan command does not throw an error when the resource 
  # does not exist initially. This is particularly important for the first run.
  depends_on = [ azurerm_cognitive_account.cognitive_service ]
}

# Output the endpoint of the Azure Cognitive Service for external consumption.
# Marking it as 'sensitive' ensures it does not get displayed directly in Terraform outputs.
output "endpoint" {
  value     = data.azurerm_cognitive_account.csa.endpoint
  sensitive = true
}

# Output the primary key of the Azure Cognitive Service for external consumption.
# Like the endpoint, it's marked 'sensitive' to avoid direct display in Terraform outputs.
output "primary-key" {
  value     = data.azurerm_cognitive_account.csa.primary_access_key
  sensitive = true
}

# This resource block is responsible for creating a local ".env" file.
# It saves key details like endpoint and primary access key of the Azure Cognitive Service.
# Useful for applications that need these details in environment variables.
resource "local_file" "env" {
  # Set the name of the file where details will be saved.
  filename = ".env"

  # Construct the content of the file using string interpolation.
  # The resultant file will have lines like:
  # COG_SERVICE_ENDPOINT=<endpoint_value>
  # COG_SERVICE_KEY=<primary_key_value>
  content  = "COG_SERVICE_ENDPOINT=${data.azurerm_cognitive_account.csa.endpoint}\nCOG_SERVICE_KEY=${data.azurerm_cognitive_account.csa.primary_access_key}"
}

Our python script need some sensitive information (endpoint and primary key) when calling the Azure cognitive services. To secure this information we’ll place both values as variables in a hidden file called “.env”.

The output.tf resource will read the values from our deployed service and write it into the .env file automatically.

With all declarations in place it is now time to bring the infrastructure into action:

Start with some cosmetics: run a “terraform fmt” within the actual folder where all files are located. This will format the terraform scripts proper.

Next, the required terraform providers must be dowloaded an initialized:

terraform init

Next it is recommended to check if there ar now typos or other linting errors in our scripts:

terraform validate

This will check all files, if there are now errors you can proceed with

terrsform plan

and if everything is fine, we can now deploy the cognitive service endpoint in your Azure subscription:

terrafrom apply -auto-approve

Now terraform will check all requirements and carry out the setup of the infrastructure to reach the desired state. This should not take more than three to five minutes.

When everything is deplayed succesful, we can leave terraform behind for now and concentrate on the pythn part of our little project:

3. Python Environment

I assume, that python is already installed on your device. You need to create a new python environment, activate it and install the required packages. Ensure that you’re still with your terminal in the actual project folder run the following commands:

python -m venv .env

# on windows
.\.env\Scripts\activate

# on macOS/Linux:
bash
source .env/bin/activate

pip install requirements.txt

requirements.txt

azure-cognitiveservices-vision-computervision==0.9.0
python-dateutil==2.8.2
python-dotenv==1.0.0
python-env==1.0.0

This file will instruct pip to download and install the libraries we’ll need for our python application later on.

4. Sample images

As already stated in the introduction this project aims to extract text information from pictures of wine bottles. To keep it simple for this demo, a subfolder in the project containing the pictures to be analyzed will be sufficient. The pictures used looks like the examples below.

Now let’s get to the real hero of out little project — the “text-extraction.py” which will do the heavy lifting. High-level the Python code performs the following tasks:

Import necessary libraries: `dotenv` for managing environment variables, `os` for interacting with the operating system, `json` for working with JSON data, `time` for adding delays, and several modules from the `azure.cognitiveservices.vision.computervision` package for utilizing the Azure Cognitive Services Computer Vision API.
The `GetTextRead` function takes an instance of Azure `ComputerVisionClient` and an path to an image file as input. It sends the image to Azure Cognitive Services’ Read API for text extraction. The function polls the API until the asynchronous operation is complete and then extracts the text from the results.
The `process_folder` function takes a `folder_path` as input, retrieves the list of filenames in the specified directory (excluding macOS-specific files like “.DS_Store”), and returns the list of filenames.
The `main` function is the main processing function that orchestrates the text extraction process. It loads the necessary environment variables for the Cognitive Service details, sets up the Cognitive Services client using the provided credentials, retrieves the list of image files to be processed from the specified directory using the `process_folder` function, and then iterates through each image file.
For each image file, it calls the `GetTextRead` function to extract the text and stores it in a dictionary (`text_data`) with the image file name as the key. After processing all image files, the extracted text data is saved to a JSON file named “wine-extraction.json”. The code then loads and prints the content of the JSON file for verification.

In summary, this script reads images from the “images” directory, extracts text from each image using Azure Cognitive Services’ Read API, and saves the extracted text along with the corresponding image filenames in a JSON file.

text-extraction.py

# Import necessary libraries
from dotenv import load_dotenv
import os
import json
import time
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials

# Path for the images to be processed
folder_path = "images"

# Function to extract text from a given image using Azure Cognitive Services
def GetTextRead(client, image_file):
    """
    This function sends an image to the Azure Read API, waits for the asynchronous operation to complete,
    and then extracts the text from the returned results.
    """
    extracted_text = []
    
    # Open and read the image
    with open(image_file, mode="rb") as image_data:
        # Send image data to Read API
        read_op = client.read_in_stream(image_data, raw=True)
        
        # Retrieve the operation ID for monitoring
        operation_location = read_op.headers["Operation-Location"]
        operation_id = operation_location.split("/")[-1]

        # Continuously check the status until operation is done
        while True:
            read_results = client.get_read_result(operation_id)
            if read_results.status not in [OperationStatusCodes.running, OperationStatusCodes.not_started]:
                break
            time.sleep(1)  # Introducing a delay before checking again

        # Extract text line by line once the reading operation is successful
        if read_results.status == OperationStatusCodes.succeeded:
            for page in read_results.analyze_result.read_results:
                for line in page.lines:
                    extracted_text.append(line.text)
                    
    return extracted_text

# Function to retrieve list of files in the specified directory
def process_folder(folder_path):
    """
    Return a list of filenames within a directory, excluding .DS_Store.
    """
    file_list = []
    
    # Ensure provided path exists and is a directory
    if os.path.exists(folder_path) and os.path.isdir(folder_path):
        for filename in os.listdir(folder_path):
            if filename != ".DS_Store":  # Ignore macOS-specific file
                file_list.append(filename)
    else:
        print("Invalid path:", folder_path)

    return file_list

# Main processing function
def main():
    """
    Main function to drive the processing of images and extraction of text.
    """
    # Load environment variables for Cognitive Service details
    load_dotenv()
    cog_endpoint = os.getenv("COG_SERVICE_ENDPOINT")
    cog_key = os.getenv("COG_SERVICE_KEY")
        
    # Set up the Cognitive Services client with the provided credentials
    credential = CognitiveServicesCredentials(cog_key)
    cv_client = ComputerVisionClient(cog_endpoint, credential)

    # Retrieve list of image files to be processed
    file_names = process_folder(folder_path)
    
    text_data = {}  # Dictionary to hold extracted text for each image
    
    for item in file_names:
        image_file = os.path.join(folder_path, item)
        text_list = GetTextRead(cv_client, image_file)
        text_data[item] = text_list
        
    # Save the extracted data to a JSON file
    with open("wine-extraction.json", "w", encoding="utf-8") as json_file:
        json.dump(text_data, json_file, ensure_ascii=False, indent=4)
    
    # Load the JSON file and print its content, useful for verification
    with open("wine-extraction.json", "r", encoding="utf-8") as json_file:
        loaded_data = json.load(json_file)
    print(loaded_data)

# Script entry point
if __name__ == '__main__':
    main()

Response json (except)

{
.....
    "Baron de Eguia Reserva 2018.jpg": [
        "Selekt egen",
        "-5",
        "Baron de Eguia",
        "Viñas Seleccionadas",
        "RIOJA",
        "Denominación de Origen Calificada",
        "RESERVA 2018",
        "89",
        "Jalstaff"
    ],
    "Cabernet Merlot Bio 2021.jpg": [
        "BIO",
        "CABERNET SAUVIGNON",
        "MERLOT",
        "TROCKEN",
        "IGP PAYS D'OC - FRANKREICH - 2021"
    ],
    "Barolo 2017.jpg": [
        "+",
        "BAROLO",
        "DENOMINAZIONE DI ORIGINE",
        "CONTROLLATA E GARANTITA",
        "2017"
    ]
.....
}

Now we have data objects, proper formatted in json format, which can be utilized further, but this will be another story coming soon ;)

Conclusion

Of course you can use the topics we discussed in this small project as a starting point for your own text-extraction adventures — play around with your own images, if you want but always keep an eye on potential costs.

I hope this article was able to give a first feel for how easy it is to use Azure Cognitive Services with the help of terraform and the Azure Python SDK. In later articles we want to start here and extend the application with further functions from the area of machine learning or generative Ai, so stay tuned!

Do not forget to delete all provisioned resources when you do not need them anymore to avoid unnecessary costs.

Simply run a

terraform destroy

and wait until terraform finishes the script and everything configured above is deleted. You can re-provision it at any time.

If you have read it to this point, thank you! You are a hero (and a Nerd ❤)! I try to keep my readers up to date with “interesting happenings in the AI world,” so please 🔔 clap | follow