I generated all images in this article using a text-to-image AI model (and the results are incredible)
How can we use Stable Diffusion?
Stable Diffusion has undoubtedly taken the world of art and images by storm. This article explores a few ways to create good prompts for these kinds of AI models.
All the prompts mentioned below are input to a stable diffusion text-to-image generation AI model run using the hugging face implementation in Google Colab. Note that I only use the base model without any modifications or fine-tuning.
This is just my approach and exploration with prompting. There could be other ways that can work better.
The Basics
Most prompts have the following features.
- Objects 🪁🧸
- Background 🖼️🏖️
- Colors 🌈🎨
- Style 🖌️👨🎨
Let us try something.
“A blue teddy bear in outer space, painted in anime”
Wow! That’s quite something.
- Blue is the color.
- The object is a teddy bear.
- The background is outer space.
- The style is anime.
Let’s try a variation.
A green teddy bear in front of a lake, painted in disney cartoon style
Not exactly what I imagined, but ok.
Let’s try one more.
A blue teddy bear in outer space, pencil sketch
These basic observations tell us that good prompts can be created with a combination of an object in front of a certain background in a certain color and style.
The Enhancers
We can enhance and add some specificity to a prompt using these words.
- Adjectives to describe something in a specific way.
- Verbs to describe actions on objects.
- Figures of speech to add a comparative layer.
Let us modify our prompt.
A beautiful blue teddy bear in outer space, playing the guitar like a rockstar, painted in anime
Hmm. Better. Accurate. I believe using generic words in prompts rather than nouns is good as more training data will be available.
Does the model pick context automatically?
Consider this prompt.
Blue teddy bear, outer space, guitar, painting, anime
Stable diffusion inherently understands context and creates it by itself, even if we don’t mention it. However, if you want unusual things, you may want to specify them properly.
How specific can we get?
Consider something like this.
A #FFC0CB colored teddy bear painting
It doesn’t seem to understand hex colors. The hex code I gave was for pink. Most images generated, however, appeared random in color or generic brown.
Fusion Prompts
Stable diffusion shines with sceneries and fusion of concepts.
a city skyline of a cyberpunk utopia
a sandwich made of green bricks
You can easily combine two concepts with stable diffusion. Notice how it does well with picking colors too.
Practical Applications
- Generating cover photos of articles on the internet. It can be used by the news media or any normal blogger. Stable diffusion generated the cover photo of this article too!
Let us take another example. An online tech magazine publishing an opinionated piece on whether AI will eventually take away all our jobs.
A robot playing basketball
- You can use these models to generate some quick concept art for any ideas you have. Movie/video game production houses can use it for inspiration and quick iterations.
a concept art of a cyberpunk monster
- People have already used these models to create collectible NFT images. Let us try one similar to the famous Bored Ape NFTs
a portrait of a bored cat in a cowboy dress
- Can we create business and media assets like logos? Let us try to create a new logo for The Research Nest.
A logo of abstract research information being dumped into a nest
- Let us go a step beyond. Is it possible to create an entire comic panel with four synchronous images? I do not see a direct way, but I think we can achieve interesting results with some simple tweaking. Here’s what I made, combining four AI-generated images in Canva.
Using this technology, it can be extremely difficult to create consistent characters and expressions and repeat objects/elements. However, I think, soon enough, we will have custom models and platforms built on top of this to create comics and get those finer details correctly.
- For the next application, I want to create some imagery for a poetry book that I am writing. I want some sketches to accompany the poetry and give a better visual to the readers.
I made a prompt with keywords on which my poems are based.
painting snowflakes summer ice trees fall colors
Are these images unique and novel?
I had my doubts. Are they truly creating something new and not giving out training data or something with minimal tweaks or random copy-paste mash?
I reverse-image searched my results on Google. All the images I tested had a uniqueness of their own. I didn’t get any relevant or similar results in the search. Some images have similar models, colors, shapes, etc. but are different enough to consider unique.
The Future
- Models can be fine-tuned for tasks like generating video game assets, NFTs, comics, or 3D graphics.
- Text to video is the next direct evolution. Text to reels, text to animation, text to GIFs, possibilities is endless.
- Text to music will also start getting popular. Perhaps, we will be able to create a video along with the BGM with a simple text prompt.
- The real deal will be the fine-tuned customization layers built on this tech to bring your imagination to life.
I think we will eventually progress from a text (or natural language) to an anything-and-everything model.
At this point, it is a given that these technologies will improve exponentially and change the world order.
Think about how powerful a DALLE-4 or a GPT-5 will be. And models like stable diffusion, when open-sourced, supercharge the development in many creative ways.
Fascinating times are ahead.