Revolutionizing Retail: Automated Attribute Identification with Google’s Gemini Pro Model

Published in

Google Cloud - Community

5 min readJun 17, 2024

In the rapidly evolving world of e-commerce, ensuring the accuracy and quality of product information is crucial for both customer satisfaction and operational efficiency. For online retailers like , managing vast catalogs of products with consistent and accurate attribute data can be a daunting task. This is where advanced AI solutions come into play. By leveraging Google’s Gemini Pro Model and Vertex AI, retailers can automate the process of attribute identification and validation, significantly improving the quality control of their product listings.

The Challenge

Retail organisations faces the challenge of ensuring that the product attributes provided by sellers are accurate and complete. This involves:

Identifying discrepancies in product registration forms.
Validating uploaded product images for duplicates, obscenity, cropping issues, and infographics.
Automating the comparison of seller-uploaded product attributes against those extracted from product images and descriptions.

The goal is to achieve higher accuracy rates for anomaly detection and minimize manual intervention.

The Solution: Gemini Pro Model and Vertex AI

To tackle this challenge, Google team has implement a solution using Google’s Gemini Pro Vision and Vertex AI models. The system is designed to automate the extraction and validation of product attributes, ensuring high-quality product information with minimal manual effort.

Key Objectives

Defining Technical Architecture: Develop a robust architecture tailored to a specific use case.
Demonstrating Capabilities: Showcasing the power of Generative AI and Gemini Vision Pro models through working processes and sample code.
Attribute Extraction: Utilizing AI strategies to extract product attributes from images and descriptions.
Optimization: Experimenting with various approaches to find the best solution, focusing on high accuracy and efficiency.

System Overview

The system architecture comprises several key components, each playing a vital role in the attribute identification process:

Data Input: Collects seller registration forms (JSON), product images, and attribute eligibility values (YAML).
Prompt Engineering: Crafts dynamic prompts for Gemini Pro Vision to accurately extract attributes from images and descriptions.
Generative AI Models:

Gemini Pro Vision: Extracts attributes from product images.

4. Attribute Comparison: Custom functions compare extracted attributes with those in the registration form, using cosine similarity for semantic matching.

5. Output: Generates a structured report detailing mismatches, confidence scores, and reasoning.

High Level Architecture

Processing Flow

Load Data: Registration forms, product images, and attribute eligibility values are loaded into the system.
Extract Attributes: Gemini Pro Vision is prompted to extract attributes from images and descriptions.
Normalize and Compare: Extracted attributes are parsed, normalized, and compared with registration form attributes.
Generate Report: A detailed report is generated, highlighting mismatches, confidence scores, and reasoning.

Implementation Details

The system is implemented using Python, leveraging several key libraries and tools:

Vertex AI: For accessing Gemini Pro Vision and other models.
YAML: For parsing attribute eligibility values.
Pandas: For data manipulation and report generation.

Inputs:

Registration form

Seller Registration Form (JSON): Contains seller-provided product information.

{
   "test_123_sku": {
       "Product_Type": "Casual Shirt",
       "Attributes": {
  
           "SKU Code": "test_123_sku",
           "Closure": "Button",
           "Hemline": "Curved",
           "Pattern Coverage": "All Over",
           "Placket": "Button Placket",
           "Product Length": "Regular",
           "Transparency": "Opaque",
           "Collar": "Regular",
           "Color": "Black",
           "Fit": "Regular Fit",
           "Pack": "Pack of 1",
           "Pattern or Print Type": "Solid",
           "Sleeves Length": "Full Sleeves"
       },
       "GCS_Image_list": [
           "input/test_image/shirt_1.jpg",
           "input/test_image/shirt_2.jpg",
           "input/test_image/shirt_3.jpg"
       ]
   }
 }

2. YAML — For parsing attribute eligibility values.

A structured file defining valid attribute values for different product types.

Product_Type: 'Casual Shirts' 
Attribute_Name_1: 'Closure'
Eligible_Values_1:
- 'Button'
- 'Zip'
- 'Velcro'
- 'Drawstrings'
- 'Pullover'
Attribute_Name_2: 'Collar'
Eligible_Values_2:
- 'Hooded'
- 'Round Neck'
- 'Mandarin Collar'
- 'Regular Collar'
Attribute_Name_3: 'Color'
Eligible_Values_3:
- 'Beige'
- 'Black'
- 'Blue'
- 'Brown'
- 'Gold'
- 'Green'
- 'Grey'
- 'Khaki'
- 'Turquoise'
- 'White'
- 'Yellow'
- 'Light Blue'
- 'Mustard'
Attribute_Name_4: 'Hemline'
Eligible_Values_4:
- 'Curved'
- 'Straight'
- 'Asymmetrical'
Attribute_Name_5: 'Pack'
Eligible_Values_5:
- 'Pack of 1'
- 'Pack of 2'
- 'Pack of 3'
- 'Pack of 4'
- 'Pack of 5'
Attribute_Name_6: 'Pattern'
Eligible_Values_6:
- 'Checks'
- 'Solids'
- 'Self Design'
- 'Embroidered'
- 'Printed'
Attribute_Name_7: 'Pattern Coverage'
Eligible_Values_7:
- 'Chest'
- 'All Over'
- 'Full Front'
- 'Back'
- 'Placement'
Attribute_Name_8: 'Pattern or Print Type'
Eligible_Values_8:
- 'Solid'
- 'Vertical Striped'
- 'Horizontal Striped'
- 'Diagonal Striped'
- 'Big Checks'
- 'Small Checks'
- 'Ethnic' 
Attribute_Name_9: 'Product Length'
Eligible_Values_9:
- 'Short'
- 'Regular'
- 'Long'
Attribute_Name_10: 'Sleeves Length'
Eligible_Values_10:
- 'Half Sleeves'
- 'Rollup Sleeves'
- 'Full Sleeves'
- 'Sleeveless'
- '3/4th Sleeves'
Attribute_Name_11: 'Placket'
Eligible_Values_11:
- 'Button Placket'
- 'Snap Button Placket'
- 'Concealed Button Placket'
- 'Button and Loop'
- 'Zip'

Product Image :

Visual representations of the product.

Output Report :

Example Prompt

The following dynamic prompt is crafted to guide Gemini Pro Vision in extracting attributes from product images:

You are an expert in understanding product images. Your task is to analyze all the images of a particular product and identify the following attributes of the product. You are provided with product attributes and their eligible values:

Attributes:

{attributes}

Identify and return the LLM identified attribute value in case the eligible values are not present.

Instructions:

1. Analyze all the provided product images one by one for each product to identify the attribute values.
2. There can be multiple products in the pack of images provided. Analyze each image to identify individual attributes, e.g., color of the product in each image.

3. Return the identified attributes and their values, reasoning for identified values, and confidence score in the format below:

Attribute_Name: Attribute_Value
Reasoning: Reasoning_Value
Confidence_Score: Confidence_Score_Value

Example:

Color: Grey
Reasoning: The shirt appears to be a light grey color with a white leaf pattern
Confidence_Score: 95%
Sleeves Length: Half Sleeves
Reasoning: The shirt has half sleeves that end above the elbow
Confidence_Score: 95%



4. Consider Registration Form attribute values and correct, rectify the registration form values wherever required.

REGISTRATION FORM:
{registration_form}

PRODUCT IMAGES:

Conclusion

By implementing a solution that leverages Google’s Gemini Pro Vision and Vertex AI models, Retail customers can automate the process of product attribute identification and validation. This system not only improves the accuracy and consistency of product information but also enhances operational efficiency by reducing the need for manual intervention. As a result, Retail customer can provide a better shopping experience for its customers while maintaining a high standard of product quality.