Descending the AI Generated Images Rabbit Hole (Day 1)
On the weekend, after generating and comparing AI generated images on Leonardo.ai and Ideogram and seeing what OpenAI are doing in the video space, I decided it was time to fully immerse myself in AI image generation to fully understand what’s happening under the hood and learn how these things work.
I spent 8–10 hours digging around the internet, watching YouTube videos and Reddit threads, installing different projects from GitHub and playing around with scripts, tools, models and prompts. I know I’ve only just scratched the surface.
Here’s some things I learned in note form, they should act as pointers for more investigation and might need some deciphering.
Getting set up
- Using Windows 10, GTX 1050Ti with 4GB VRAM.
- Automatic1111 Stable Diffusion WebUI (https://github.com/AUTOMATIC1111/stable-diffusion-webui)
- Use Python 3.10.6 on Windows, not the latest version (RTFM)
- Download models to the \Web UI\models\Stable-diffusion directory
- My spec was probably as low as you could go and was very slow to render images and had memory issues.
Models
- Downloaded Stable Diffusion 1.5, 2.1 and XD
- No one uses these models however and most people use fine-tuned custom models called checkpoints
- There are thousands of free checkpoint models for different use cases on sites like https://civitai.com and https://huggingface.co/
- Also tried RealisticVision V6 (https://civitai.com/models/4201/realistic-vision-v60-b1) and Dreamshaper (https://civitai.com/models/4384/dreamshaper)
- They are each several GB in size and will eat up your hard drive space
Generating images
- You must write a prompt (and optional negative prompt) of course but it’s not as simple as it sounds
- There are guides on how to do this and opinionated formats to use, like this: https://stable-diffusion-art.com/how-to-come-up-with-good-prompts-for-ai-image-generation and this https://civitai.com/articles/3770/how-to-prompting-with-style-and-quality
- Adjusting the Steps and CFG scale have a huge influence on the image generated. But it’s not just a case of dialing up the numbers — a lot of the time, lower numbers generated more desirable images:
- CivitAI (and others have galleries with exact prompts, models, settings etc.. but you copy them and they still don’t look as good. What is the issue here?
- A lot of trial and error will be involved to get things to look the way you want and you need to know what good looks like.
- When trying to create a more realistic face I asked for pores and wrinkles, but pores added strange cracks to the skin, wrinkles by itself worked much better. Try, try, try again…
- Notice my headshots always look at the camera. I’d like to get them in different poses.
- 80–90% of my images were rubbish
Other great resources
These websites and resources kept coming up in my Google searches:
Final thoughts
- While people are saying AI is going to take our jobs (normally by people who don’t do those jobs), at the moment anyway, this is a job that still requires skill, experience, creativity and effort. While job roles might shuffle around, like all technological leaps, we always end up creating more jobs than we destroy.
- I think I’d need 10 more weekends to get competent with this.
- From an automation point of view, which is my background with Shotstack, there is a lot of trial and error required to get images right. Is there a perfect set of settings you can use to create great images, unattended, every time?
To do
- Image to image
- Text to video
- Inpainting/outpainting
- Embeddings, Loras, Dreambooth and training
- Creating a custom model/checkpoint
- Doing everything via command line/scripts/hosted
Note: The rabbit image at the top of this article was generated by Leonardo.ai. This was my attempt: