A Visual Remix : Swap Objects with Ease

Published in

Google Cloud - Community

3 min readMar 30, 2024

Artificial intelligence (AI) has revolutionized the way we create marketing images. With a simple text prompt, we can generate stunning visuals tailored to our campaigns. Following image, generated using ImageGen on Veretx AI, illustrates a sample of marketing image for a targeted campaign.

AI Generated Image with Google’s Imagegeneration@005 model from Vertex AI Studio

The Missing Piece: Real Products

AI image generators are amazing at producing synthetic images from prompts like “A young woman standing in a gym and holding a ‘Specific Brand ’ sneaker in her hand”. The problem? Even if the brand logo appears correctly, that sneaker may not be an actual, purchasable brand product. These images, while visually appealing, lack the crucial link to real merchandise, making them less useful for personalized marketing campaigns. Techniques like subject tuning are promising but not yet refined enough to consistently produce marketing-grade quality content.

The Solution: Product Replacement

What if we could easily replace generic products in AI-generated images with the specific products we want to promote? Imagine you have a stock image of a shoe from a particular brand as shown below-

Sample AI generated imaginary product [Replace with your branded product stock image]

We can achieve easy object replacement with out manually drawing any bounding boxes or marking segmentation mask with the help of following steps -

(1) Gemini Model: LLM for understanding the name (subject) of the product to be replaced.

(2) Google Cloud Vision API: Object detection to find the subject in the source images

(3) Segment Anything (SAM) Model: for Image Segmentation

(4) Diffuser Model: for Image Impainting

Let’s understand each step with some derails below -

(1) Gemini for Text processing :

Imaging the end user of your tool with use a simple english command like — replace ‘object name’ in the given image with target. As a first step you need to find which object is to be replaced.

(2) Google Cloud Vision API:

Use Google Cloud Visio API to detect the desired object in the given image

Object detection using Google Cloud Vision API and segmentation with SAM model

(3) Segment Anything (SAM) Model:

Once the object is detected from Google Vision API, we use SAM model to segment the object. Perform similar operations on the target images as shown below -

(4) Diffuser Model:

In most of the cases, the object in the AI generated image or in the given input image will be of different shape and size. Hence, if we just resize the target image and superimpose with the given image, it would look like below image

Superimposed masked images of the object and the target product

The black portion in the above image represents the part of the original object where the target image has no appearance. This region need to be filled intelligently. We use image impainting technique using diffusers model to perform the task. Following image shows the final output -

We can use methods like ‘Haugh Transform’ to calculate the angle of rotation as well. However this method can work only if the desired objects are inclined in the 2D plain. As a future step, we need to incorporate 3D roations.

A Visual Remix : Swap Objects with Ease

(1) Gemini for Text processing :

(2) Google Cloud Vision API:

(3) Segment Anything (SAM) Model:

(4) Diffuser Model:

Written by Bhushan Garware