What are AI-generated Images? — The Technology Behind You Need to Know in 2023.

AGEDB
Bitnine Global
Published in
5 min readJun 27, 2023

It’s only these past years since we started to hear ‘Chat GPT,’ ‘AI-generated Images,’ ‘AIs taking over our jobs!’ etc. Why is this technology drawing all attention all this sudden? It’s due in part because of its impacts on computing. This article will unravel the technology behind AI-generated images, providing valuable insights into the advancements, techniques, and applications that make them an indispensable part of our digital future.

How do AIs generate images?
The Stable Diffusion Model is a generative AI model that generates images to match text. This model helps create a detailed image according to the text input. It can also recreate a specific part of an existing image. Let’s break things down.

Figure 1. Extracting Objects (Noun) and Relationships (Predicates) from Images and structuring them into a Scene Graph (Source: Scene Graph Generation by Interactive Message Passing)

What is Scene Graph?
Scene Graph is a method of detecting objects in an image or video, then inferring the relationship between objects by making a graph. Figure 1. shows that an object from the image (Eg. Mountain and a horse) is extracted as a noun and expressed as a node in the graph, and the relationship between objects (Eg. behind) is extracted as a predicate to form a heterogeneous node and edge of the graph.

The graph expression method is in the limelight as it visually expresses a complex relationship between objects and makes inferences within images, unlike the existing ‘image captioning’ method. Scene Graph enables the creation of more accurate and specific images with a Stable Diffusion by structuring text prompts. Complex text prompts sometimes confuse the generative AI when the image generation model learns natural language. That is why specifying the relationship between words in these complex sentences through Scene Graph can create more diverse images clearer. In addition, the created images can also be easily modified by controlling specific words and relationships in Scene Graph.

So how do texts turn into images?
Methods for generating images by turning text prompts into Scene Graphs include ‘Frido,’ SceneGenie,’ ‘SGDiff,’ ‘SG2Im,’ and ‘diffuscene.’ Among these, Scene Genie is a representative algorithm that uses a Stable Diffusion Model and has a high accuracy of generated images. Let’s look at an algorithm that converts text prompts into Scene Graphs through this algorithm.

Stable Diffusion Model Process

Figure 2. Noising and Denoising, processing the latent information on the CLIP model using Encoder and Decoder (Image source)

The Stable Diffusion Model uses Open AI’s ‘Contrastive Language-Image Pre-training (CLIP) model.’ CLIP is a huge pre-learning model with text and corresponding image information and provides latent information by mapping images suitable for text. After mapping texts into imagery, the image will go through the noising and denoising process. Denoising refers to restoring an image from noise. Noising, on the other hand, is the process of adding noise to an image. The repetition of this Noising and Denoising process leads to AI learning; The process lets the AI train itself multiple times by adding noise and restoring them to a given image to improve its capabilities in identifying and predicting the noise. (See Figure 2.) The Variational Autoencoder (VAE) is also a crucial step in image generation. VAE encodes and decodes images, and supports image optimization by speeding up the process.

Scene Genie – Stable Diffusion using Scene Graph

Figure 3. Architecture of SceneGenie (Source: Scene Graph Guided Diffusion Models for Image Synthesis)

The simple text-to-image Stable Diffusion Model does not produce accurate images from complex text prompts. In particular, when many objects perform many actions with various relationships, it often leads to inaccurate image creation. SceneGenie, however, on top of its capabilities with simple text prompts, also predicts the approximate location area for images created using the Scene Graph that shows the relationship between objects.

In the existing Stable Diffusion Model, only images of latent information from CLIP learn to restore and generate through noise and denoise. In SceneGenie, Graph Convolution Network (GCN) trains the classification process and allows the latent information and the area of the object to b created as a Scene Graph. Therefore, the information collected and optimized makes it possible to clearly create images with more objects and relationships than existing generative models can.

Figure 4. A text prompt that cannot distinguish objects (left) and a scene graph that can distinguish between objects and realtionships (right)
Figure 5. Comparison of image generated by DALLE.E 2 (left) and SceneGenie (right) (Source: SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis)

In the example above, you can see that SceneGenie, which uses the Scene Graph generated better images for complex objects and relationships. In the image generated by DALL.E 2, the word ‘sheep’ was recognized as one object. Scene Graph, however, classified ‘sheep’ mentioned twice in the text prompt as two different objects. Furthermore, other objects such as ‘Boat’ are also more accurately positioned as requested.

In essence, image generation using Scene Graphs creates more accurate images by decoding the relationships among complex objects. Existing image-generating AI is often difficult to generate a desired image with complex text prompts, especially when multiple objects are present. A Scene Graph expresses relationships between objects as a graph, making it easier to infer the situation within an image or text. With the progressive rise of generative AI, the adaption of Scene Graphs is becoming increasingly essential to achieving accurate and meaningful synthesis.

Scene Graph and Graph Technology still feel too advanced of technology? Rest assured! AGEDB handles the graph technology plug-in to your existing data model.

If you’re interested in learning more about our pluggable graph technology, do not hesitate to contact us through our website or at contact@agedb.io!

--

--