Product photography with AI — Tech Note of Planningo

Planningo.inc
7 min readJan 1, 2024

--

TL;DR

  • There are lots of image generation AI these days.
  • If we want to control image with text but also with an image, we have to use Stable diffusion with Control-net
  • We created a service that making product images with stable diffusion
  • We will keep research and use the AI model for making product contents

The age of finding the desired image? Now is the age of creating it yourself.

News that AI can create realistic images similar to photographs has been around for years. However, recently, AI modules that can produce realistic and natural-looking images have emerged.

Some notable examples include DALL-E, Midjourney, and Stable Diffusion. It has been reported that major companies like Google and Meta are also developing their own AI systems for image generation.

Planningo, which focuses on creating product content, conducted research on which AI to use and how to utilize it to easily create product content. If it is possible to easily create product content using AI, it is expected to be widely used when preparing detailed product pages or promotional materials for online stores.

The representative text-to-image models we identified are as follows.

DALL-E

DALL-E is a text-to-image model created by OpenAI, well-known for its chat-gpt model.

image made from DALL-E model

It mainly focuses on generating images from text descriptions.

This model creatively generates images that match a specific sentence or word. For example, if the text input is “banana-shaped sofa,” it generates an image of a banana-shaped sofa in a living room, as shown in the picture above. DALL-E can transform text into visual elements to create creative and unique images.

Midjourney

This Model is an AI model that utilizes Generative Adversarial Networks (GANs).

image made from Midjourney

GANs are used to generate realistic images that are difficult to distinguish from real ones, and Midjourney uses this technology to create realistic and creative images. There was once controversy when an AI-generated artwork was selected as the winning piece, and it was midjourney that created that artwork.

Stable Diffusion

Stable Diffusion is an image generation model developed by Stability AI, focusing on stable image generation. This model aims to alleviate the instability issues that occur during the image generation process and achieve more consistent image generation.

image made from Photio’s stable diffusion model

Furthermore, there is a Control-net that can partially control the image generation AI.

After researching the features of these models, we decided to use the Stable Diffusion model for our service. It is obvious that other image generation models also have excellent performance.

So why did we choose Stable Diffusion?

Why we have to use Stable Diffusion

Actually, we had to use Stable Diffusion.

What we need to use for generating product content is not text-to-image, but image-to-image.

To create product content, we need to generate the natural background around the product based on the input product image. Therefore, an image should be present as an input.

The existence of Control-net was also significant. Through Control-net, Planningo was able to create a more product-specific image using the Stable Diffusion model.

What is Control-net

example images of controlnet canny model — from controlnet git hub

As shown in the picture above, Control-net is a model that allows control over how Stable Diffusion generates images.

We wanted to create a more natural background around the product and generate an image as if the product were originally placed in that background. We used the canny model of Control-net to generate the image.

canny image / image generation without controlnet / image generation with controlnet

Without using controlnet-canny model, it can be seen that the generated image does not consider the product at all when creating the product background image.

By using Control-net, we were able to observe that the image was produced by referring to the rough appearance of the product.

Furthermore, when using Control-net, there is a model called controlnet — reference only, which generates images by referring to additional input reference images.

Of course, various techniques such as training, adjusting parameters, and using additional Control-nets can be used to create natural-looking image outputs that follow the reference.

Partial regeneration with Stable Diffusion

Stable Diffusion also allows for partial regeneration of desired parts.

In the image-to-image module, it is possible to regenerate only the desired parts by masking. This allows for the regeneration of user-uploaded images as well as desired parts of the generated images.

in our service we can regenerated the partial image

Through partial regeneration, unwanted parts can be removed, and different objects can be created based on prompts.

The Importance of Prompts

One of the most important parts in AI image generation models is the prompt. It is even referred to as prompt engineering, as the way prompts are written greatly affects the performance of the AI in generating images.

Prompt engineering provides guidelines on what information to include and how to include it in prompts. However, it is not reasonable to expect users who simply want to create product photos to undergo such complex engineering processes.

We monitored that users were struggling to write prompts. So, we implemented a method for users to receive a draft of the prompt.

Planningo’s AI service Photio, which generates product backgrounds, requires reference images as an essential part.

Using the reference images, we created an image-to-text AI model that can generate prompts in reverse. This model allows users to generate images that closely resemble the reference images without having to write additional prompts.

Planningo’s image-to-text model is designed to perform the following functions.

Planningo made special Image2text model for Photio service

When analyzing reference images, the model focuses on describing the reference image, excluding the main object of the reference.

If the prompt includes a description of the main object in the reference image, it can lead to the main object appearing mixed in the generated image.

Using AI to generate prompts significantly reduced the difficulty of prompt creation, and we were able to observe a significant increase in image quality when users generate images.

One of our users, @innoadrie, created various product images with Photio.

Future Development of Photio Service

The difference between AI background generation service Photio and other AI background generation services lies in the unlimited number of reference images that can be used. The number of references is essential in the Photio service.

We have already many reference. but we need more!

To upgrade Photio service from the beta stage to the official release stage, we need more reference images.

While Photio’s AI model allows for the infinite production of product images with the use of product photos and reference images, we have observed that repetitive reference images reduce user appeal, and the process of finding reference images is time-consuming. We also evaluate the quality of the generated images one by one.

Based on this feedback, Planningo is developing an image classification AI that can generate reference images and evaluate the quality of the generated images.

Furthermore, using Planningo’s WebGL technology, we are preparing an editor that can adjust product images and the generated images to create more professional and high-quality images.

Through WebGL, we can adjust various planar elements of the image, such as brightness and shadows. Additionally, using AI to judge the depth of the image, we can add lighting effects and make adjustments to product photos or result photos.

Image adjusted with WebGL — Planningo / Image relight with WebGL — Prome AI (we can cover this with webGL)

In addition, there are plans to introduce the stable diffusion sdxl turbo version model, consider the introduction of video content production, and improve the overall UX of the Photio service, among other various service evolutions.

The research and development of technologies related to Photio are broad and deep, and we will delve deeper into them through other blog posts.

Planningo is focusing on creating product content in various ways. With the use of AI, Planningo’s methods of creating content have expanded, and we are making great efforts to bring about significant changes in the product content market. We are working on expanding the product content market through various technologies.

--

--

Planningo.inc
Planningo.inc

Written by Planningo.inc

Commercial solution startup. WebXR, AR/VR for products, AI product background image generation technology research company.

No responses yet