AI Image analysis with Azure

Introduction to Azure AI Vision’s Image Analysis

Gianpiero Andrenacci
Data Bistrot
9 min readJun 4, 2024

--

Azure AI Vision’s Image Analysis is a robust service designed to extract rich information from digital images, enabling automated understanding and categorization of visual data and even modifying images. This service utilizes advanced machine learning models to analyze image content and provide detailed insights without manual intervention.

Image Analysis is part of Azure AI Vision suite. Azure AI Vision is a comprehensive set of tools and services designed to empower developers to create intelligent applications that can analyze and understand visual content:

Image Analysis 4.0

Azure Vision Image Analysis 4.0, now generally available, represents a significant leap forward in image processing capabilities within the Azure ecosystem. This latest version introduces several advanced features, including synchronous Optical Character Recognition (OCR) and people detection, making it a versatile tool for a wide range of applications.

You can integrate Image Analysis 4.0 into your applications using either the client library SDK or by making direct calls to the REST API. Both methods offer robust support for the new features, ensuring that you can choose the approach that best fits your development needs.

Azure AI Vision’s Image Analysis empowers organizations to harness the power of visual data, turning images into actionable insights with minimal effort. Whether optimizing business processes, enhancing user experiences, or creating new ways to engage with content, Image Analysis offers a powerful toolset for comprehensive visual understanding.

The Analysis 4.0 API in offers a comprehensive suite of functionalities that can be tailored to specific use cases. Here’s an introduction to the main features you can leverage through this API:

Visual Features

1. TAGS: Automatically generates tags for the contents of the image, identifying objects, themes, and actions, which helps in categorizing and searching images.

2. OBJECTS: Detects and outlines objects and living beings with bounding boxes, providing their locations within the image. This is useful for object recognition and spatial analysis.

3. Recognize and analyze shelf products: Find specific objects within a single image for use cases such as locating products on shelves, merchandise on a store display, or items in an assembly line.

4. CAPTION: Provides a concise, human-readable sentence that summarizes the overall content of the image, aiding in quick understanding and accessibility.

5. DENSE_CAPTIONS: Generates detailed descriptions for all significant elements in the image, explaining not only what is present but also their interactions, offering a deeper level of understanding.

6. SMART_CROPS: Suggests optimal cropping of images based on visual content, ensuring key aspects of the image are highlighted when resized or displayed in different formats.

7. PEOPLE: Identifies and analyzes human faces and bodies within the image, which can include counting people, identifying postures, and detecting demographic features like age and emotion.

8. Search photos with image retrieval: Retrieve specific moments within your photo album. For example, you can query: a wedding you attended last summer, your pet, your favorite city. Search for images based on the content of the image itself, rather than relying solely on manually assigned keywords or tags

In the following sections of the article, we’ll describe the main functionalities of azure vision image analysis.

Extract common tags from images

The Extract Common Tags from Images functionality within Azure’s image analysis services automatically identifies and labels thousands of recognizable objects, living beings, scenery, and actions within an image. This feature provides a set of tags that succinctly describe the primary and secondary elements of the image, facilitating content categorization, searchability, and organization. It’s particularly useful for managing large image libraries, improving the discoverability of digital assets, and supporting content-based filtering and recommendation systems.

Portal: Vision Studio

Page: Vision Studio (azure.com)

Python SDK: Call the Image Analysis 4.0 Analyze API — Azure AI services | Microsoft Learn

Python library: azure-ai-vision-imageanalysis

Detect common objects in images

The Detect Common Objects in Images functionality identifies and locates a wide range of recognizable objects and living beings within an image. It provides bounding boxes around each detected item, allowing for easy identification and analysis of the content in the visual data. This feature is useful for applications needing object recognition, tracking, and automated content tagging.

Portal: Vision Studio

Page: Vision Studio (azure.com)

Python SDK: Call the Image Analysis 4.0 Analyze API — Azure AI services | Microsoft Learn

Python library: azure-ai-vision-imageanalysis

Recognize and analyze shelf products

The functionality to recognize and analyze shelf products is an advanced feature of Azure’s computer vision capabilities. It is specifically designed for retail and manufacturing scenarios where precise object recognition in images is crucial.

Here’s a breakdown of how this functionality works and its potential applications:

  • Object Detection: This feature can identify specific products or items within a single image, distinguishing between different types of merchandise or components.
  • Shelf Analysis: It helps in analyzing how products are arranged on shelves, which can be crucial for inventory management, planogram compliance, and visual merchandising.
  • Automation and Efficiency: By automating the detection and analysis of items, businesses can streamline operations, reduce labor costs, and improve accuracy in tasks such as stock checks and replenishment.

Applications

  • Retail: In retail environments, this functionality can be used to monitor shelf stocks in real-time, verify product placement against planned layouts (planograms), and analyze customer interactions with products for insights into buying behavior.
  • Manufacturing: On assembly lines, it can ensure that components are present and correctly positioned before further processing, helping to maintain quality control standards.
  • Warehouse Management: In warehouses, it can assist in inventory management by quickly identifying and counting items, potentially integrating with an automated management system to track stock levels.

Benefits

  • Efficiency: Reduces the time spent manually checking and managing stock.
  • Accuracy: Minimizes human error in inventory and compliance checks.
  • Insights: Provides valuable data on product performance and customer preferences based on how products are interacted with on shelves.

This functionality, by leveraging deep learning and computer vision, allows businesses to maintain an accurate, real-time understanding of product placement and availability, leading to better customer satisfaction and operational efficiency. It’s currently in preview, indicating ongoing development and optimization by Microsoft to refine the technology further.

Portal: Vision Studio

Page: Vision Studio (azure.com)

Computer Vision API: Cognitive Services APIs Reference (microsoft.com)

Quickstart: Product Recognition — Image Analysis 4.0 — Azure AI services | Microsoft Learn

Add captions/dense captions to images

· Add Captions to Images: Generates a simple, human-readable sentence that describes the overall content of an image. It provides a broad overview.

· Add Dense Captions to Images: Provides detailed captions that describe all important objects and their interactions within the image, offering a more comprehensive understanding of the visual content.

The main difference between adding dense captions to images and adding captions to images lies in the level of detail.

The functionality to add captions to images is a powerful feature within Azure’s AI vision services, designed to enhance the accessibility and usability of visual content by providing detailed, human-readable descriptions of images. This tool is particularly valuable for content creators, marketers, and accessibility initiatives.

Applications

  • Accessibility: Makes visual content more accessible to people with visual impairments by providing detailed descriptions of image content.
  • Content Management Systems (CMS): Enhances media libraries by automatically generating captions, improving searchability and organization.
  • Social Media Platforms: Automates the captioning process for images uploaded by users, enhancing engagement and the user experience.
  • E-Commerce: Provides detailed product descriptions in images, aiding in customer decision-making and improving SEO.

Portal: Vision Studio

Pages: Dense Caption — Vision Studio (azure.com)

Caption — Vision Studio (azure.com)

Python SDK: Call the Image Analysis 4.0 Analyze API — Azure AI services | Microsoft Learn

Python library: azure-ai-vision-imageanalysis

Remove Backgrounds from Images

The Remove Backgrounds from Images feature in Azure’s vision services allows you to seamlessly extract foreground elements by automatically erasing the background. Azure’s image processing functionalities offer two distinct modes for handling image backgrounds, specifically focusing on different outputs based on the mode selected:

Background Removal Mode

  • Functionality: This mode separates the foreground (main subjects) from the background.
  • Output: The response is a four-channel PNG image (RGBA), where the foreground is preserved and the background is made transparent. This is useful for images where you want to isolate the subject from its surroundings without losing the original image context.
  • Example: For a photo of a city near water, the buildings and water would be kept intact while the sky would become transparent.

Foreground Matting Mode

  • Functionality: This mode focuses on creating a detailed mask that represents the opacity of each pixel in the image concerning the foreground.
  • Output: The result is a one-channel PNG image (grayscale), where the white areas represent the foreground and black areas represent the background. This mask is useful for more complex image editing tasks where precision in foreground selection is needed.
  • Application: It can be used in advanced graphic design and photo editing where blending or compositing of the foreground with different backgrounds is required.

These functionalities enhance flexibility in graphic and media applications, providing tools for precise image manipulation and editing.

Portal: Vision Studio

Page: Vision Studio (azure.com)

Python SDK: Background removal is only available through direct REST API calls. It is not available through the SDKs.

Quickstart: Remove the background in images — Azure AI services | Microsoft Learn

Create smart-cropped images

The Create Smart-Cropped Images functionality in Azure’s vision services automatically adjusts the framing of an image to highlight the most important areas. This feature uses advanced algorithms to analyze an image’s content and determine the optimal crop that maintains the focus on key elements, ensuring that the most visually significant parts are emphasized. This is particularly useful for creating thumbnails, optimizing images for different device screens, or enhancing visual presentations in marketing and media applications.

Portal: Vision Studio

Page: Vision Studio (azure.com)

Python sample code: azure-ai-vision-sdk/samples/python/image-analysis at main · Azure-Samples/azure-ai-vision-sdk · GitHub

RES API : Cognitive Services APIs Reference (microsoft.com)

Quickstart: Image Analysis 4.0 — Azure AI services | Microsoft Learn

Search Photos with Image Retrieval

The Search Photos with Image Retrieval functionality is part of Azure’s vision services, offering a sophisticated way to navigate and manage large photo collections by querying based on the content of the images themselves. This is particularly useful in digital asset management, where manual tagging can be labor-intensive and sometimes inaccurate. It’s currently in preview, and you have to Apply here for access.

Key Features

· Content-based Image Retrieval: This feature uses advanced machine learning models to analyze and understand the visual content of photos without relying on metadata or tags. It allows users to find images based on visual similarities and content descriptors.

· Natural Language Queries: Users can perform searches using natural language, making the system intuitive and user-friendly. For example, you can find images by describing scenes, objects, or activities, such as “a wedding I attended last summer” or “my pet.”

· Deep Learning and AI: The backend of this functionality leverages deep learning algorithms to extract features and understand the context of images, enabling more accurate retrieval based on the queries.

Applications

· Personal Photo Management: Helps individuals organize and find photos in personal collections, like finding pictures from specific events, locations, or featuring particular people or pets.

· Media and Publishing: Assists in managing large digital asset libraries, enabling quick retrieval of relevant images for content creation and publication.

· Retail and E-commerce: Can be used in retail settings to match products with images from catalogs or user-generated content, enhancing customer experience and interaction.

· Tourism and Travel: Enables easy management and retrieval of images based on landmarks, destinations, or experiences, useful for promotional materials or travel planning.

Portal: Vision Studio

Page: Vision Studio (azure.com)

Image Retrieval Private Preview Access: Image Retrieval Private Preview Access (office.com)

--

--

Gianpiero Andrenacci
Data Bistrot

AI & Data Science Solution Manager. Avid reader. Passionate about ML, philosophy, and writing. Ex-BJJ master competitor, national & international titleholder.