Imagen 3: First look on the new age of GenAI image generation

guillaume blaquiere
Google Cloud - Community
9 min readAug 20, 2024
Logo generated by Imagen 3

Generative AI is a very hot topic with large companies and startups alike locked in fierce competition. The field is vast, encompassing text, video, slides, music, and even multimodal and reasoning capabilities.
Among these, image generation is one of the most complex and, in my view, the most fascinating domains. You provide a text description, and magically, an image is generated!

Early models were often imprecise, even disappointing. However, through multiple iterations, these models have significantly improved, and the competitive landscape has accelerated the pace of innovation and breakthroughs.

Imagen, Google Cloud’s image generation offering, is one such model. I was privileged to test the version 3 in a private preview.

Here are some of the cool and funny things I achieve with Imagen 3

The 2 flavors of Imagen 3

This new iteration of Imagen comes with 2 models:

  • Imagen 3: A photorealistic model that generates very high-quality images
  • Imagen 3 Fast: Generates high-quality images with low latency

The tradeoff between the 2 models is quality vs. speed. While Imagen 3 Fast can generate 4 good-quality images in less than 4 seconds, Imagen 3 can generate 4 very high-quality images (with superior details, lighting, shadow, background blur, bokeh effect, etc.) in under 10 seconds.

Typically, Imagen 3 Fast is great for fine-tuning your prompt, while Imagen 3 is the right choice for the final result.

The prompt, the base of GenAI

All generative AI models require input — a prompt that guides the model toward the desired outcome.

For image generation, you should describe the image content, including the foreground, background, style (photorealistic, comic, painting, etc.), and lighting.

While this may seem straightforward, as a non-native English speaker, I find it challenging to use the precise vocabulary needed to guide the machine effectively. Consequently, my experiments involve some creative workarounds (see below!).

Image to image generation

To overcome my language limitations, I leveraged Gemini’s multimodal capabilities to generate image descriptions, which I then used to test Imagen 3’s capabilities.

Let’s start with one of my corporate product images. I work for Carrefour, a major global retailer.

I asked Gemini 1.5 Pro to provide a detailed description using this prompt:

Describe this image with very precise details

The output, and my Imagen 3 prompt, is the following

This is a photograph of a 500g bag of Carrefour Bio Thai rice.

The bag is light blue at the top, with a serrated edge along the opening. The bottom ⅔ of the bag are a light, sandy brown color.

The text “Carrefour BIO” is written in large, green letters near the top of the bag. The word “BIO” is encircled. Below that, the words “Riz — Rijst THAI” are written in smaller, dark brown letters.

The left side of the bag contains a nutritional label with some of the information cut off. The right side of the bag shows a picture of cooked white rice in a teal blue bowl, topped with a green sprig of basil. Behind the bowl is a blurry picture of rice paddies on a hillside.

The bottom left corner of the bag features a green and white “AB Agriculture Biologique” certified organic logo, the European Union’s green leaf organic logo, and a Nutri-Score rating of ‘A’ (in green) for this product. The bottom right corner has “500ge” printed in blue, indicating the weight of the rice inside the bag.

Using this description, I generated images with Imagen 3, Imagen 3 Fast, and Imagen 2 to highlight the differences in quality and improvements.

Imagen 3

Imagen 3 fast

Imagen 2

My assessment

Imagen 3’s output is truly incredible. The images are of such high quality that I would readily use them in marketing campaigns or other external communications.

Imagen 3 Fast is excellent for ideation and sparking creativity, but its output is best suited for internal use.

Imagen 2, on the other hand, is outdated. The text and logo are unacceptable, the contrast is poor, and the style is bland. It’s simply unusable.
Additional malus: it’s much slower to generate poor quality images compared to Imagen 3. Definitively, switch to the version 3!

Literature illustration

My second experiment involved using famous literary scenes as prompts to generate images, with the goal of illustrating books and novels.

I selected excerpts from James and the Giant Peach and submitted them to Imagen 3.

There was an Old-Green-Grasshopper as large as a large dog sitting directly across the room from James now. And next to the Old-Green-Grasshopper, there was an enormous spider. And next to the Spider, there was a giant Ladybird with nine black spots on her scarlet shell. Each of these three was squatting upon a magnificent chair. On a sofa nearby, reclining comfortably in curled-up positions, there were a Centipede and an Earthworm. On the floor over in the far corner, there was something thick and white that looked as though it might be a Silkworm.

The tunnel was damp and murky, and all around him there was the curious bittersweet smell of fresh peach. The floor was soggy under his knees, the walls were wet and sticky, and peach juice was dripping from the ceiling. James opened his mouth and caught some of it on his tongue. It tasted delicious.

My assessment

I was amazed by the accuracy of the images, the dark and unsettling atmosphere, and the sense of anxiety they conveyed. I thoroughly enjoyed this experiment and would love to test it with other descriptive texts.

Feel free to share your own!

Storytelling image generation

My final experiment was to create a three-panel comic strip illustrating a marriage proposal. The strip would depict a man buying flowers, then a ring, and finally proposing to his girlfriend.

I attempted many prompts using my limited English vocabulary, but the results were always unsatisfactory.

Generate a 3 stages horizontal comic strip. The comic talks about a marriage proposal. In the first stage, a man enters a flower store to buy roses. In the second stage, the same man enters a jewelry store to buy a ring. In the 3 stage, the man is a knee on the floor, in front of his girlfriend, offering her the roses and the ring, both are smiling

The result being poor, I ask for help from Gemini 1.5 pro to improve the image description with the same prompt slightly adapted.

Generate a detailed description of a 3 stages horizontal comic strip for image generation. The comic talks about a marriage proposal. In the first stage, a man enters a flower store to buy roses. In the second stage, the same man enters a jewelry store to buy a ring. In the 3 stage, the man is a knee on the floor, in front of his girlfriend, offering her the roses and the ring, both are smiling

The output is long and detailed and I submitted it to Imagen 3

Comic Strip: A Proposal to Remember

Panel 1: Petals and Promises

Setting: The interior of a charming, light-filled flower shop, overflowing with colorful blooms.

Characters: A slightly nervous-looking man in his late 20s, dressed in a nice shirt and slacks, stands at the counter. He’s holding a bouquet of a dozen long-stemmed red roses, their velvety petals a vibrant splash of color against his clothes.

Action: The man smiles sheepishly at the florist, a kind-looking older woman with her hair in a bun, who’s tying a beautiful ribbon around the roses. He seems a bit flustered, fidgeting with his phone as if waiting for a message.

Speech Bubble (Man): “They’re perfect, thank you! And maybe… could you add some baby’s breath? You know… for extra romance?”

Panel 2: Bling and Butterflies

Setting: The interior of a high-end jewelry store, elegant and sparkling with glass cases full of glimmering necklaces and rings.

Characters: The same man from the first panel, now a picture of barely-contained excitement, examines a beautiful diamond ring presented to him on a velvet cushion by a sharply dressed salesperson.

Action: The man’s eyes are wide with a mixture of awe and anticipation. He holds his breath slightly, completely captivated by the sparkle of the diamond.

Speech Bubble (Salesperson): “Excellent choice, sir. This particular cut is known for its brilliance. It will truly take her breath away.”

Panel 3: Forever Starts Now

Setting: A romantic outdoor location — perhaps a park with a blooming cherry blossom tree, a candlelit rooftop with a city skyline, or a beach with the sun setting over the ocean.

Characters: The man, now down on one knee with a look of pure love and adoration on his face, holds out the bouquet of roses and the open ring box towards his partner. She’s visibly surprised, hand flying to her mouth in a gesture of happy shock. Tears well up in her eyes, which sparkle brighter than the diamond.

Action: The woman’s surprise is giving way to overwhelming joy. She’s about to say “Yes!”

Speech Bubble (Man): “Marry me?”

Thought Bubble (Woman): “A thousand times YES!”

My assessment

This time, it was deceptive. It’s not possible to generate multi stage images or comic strips with a single prompt. The model mixed all the information and generated inconsistent images.

The solution here is to ask to generate the strip stage by stage, but you must include details about your personas’ details to have the same man in all the strip stages, generated individually.

However, the future improvement of Image 3, planned to be released later this year, should help to solve this issue!

A latest try

Because Imagen 3 promises to increase the quality and decrease the defect (or hallucination), I asked to generate a trendy image in this Olympics Summer 2024!
Here the prompt

Photorealist group of break-dancers dancing in a battle contest at the Paris Olympics, with Eiffel Tower in the background

Imagen 3

Imagen 3 Fast

Imagen 2

My assessment

This prompt is particularly challenging for the model, as it requires generating images of dancers with upside-down faces.

Imagen 3 handles this task quite well.
Imagen 3 Fast produces more unusual results, with blurry faces and two Eiffel Towers!
Imagen 2’s output is absolutely not photorealistic and struggles with the correct number of feet/legs!

Despite some difficulties, Imagen 3 performs very well in this complex situation.
Imagen 2 is totally out of scope!

The promise of photorealism with fewer defects has been fulfilled by Imagen 3.

Now, your turn!

Imagen 3 has demonstrated significant advancements in image generation, both in terms of speed and quality. I encourage you to try it out — it’s now generally available!

But this is just the beginning! The presented roadmap by the Google Cloud team promises an exciting future: imprinting, background updates, removing parts of images (like Magic Eraser in Google Photos), and more!

These features will be released later this year, so stay tuned. The magic is only getting bigger!

--

--

guillaume blaquiere
Google Cloud - Community

GDE cloud platform, Group Data Architect @Carrefour, speaker, writer and polyglot developer, Google Cloud platform 3x certified, serverless addict and Go fan.