Descending the AI Generated Images Rabbit Hole (Day 1)

Jeff Shillitto

--

On the weekend, after generating and comparing AI generated images on Leonardo.ai and Ideogram and seeing what OpenAI are doing in the video space, I decided it was time to fully immerse myself in AI image generation to fully understand what’s happening under the hood and learn how these things work.

I spent 8–10 hours digging around the internet, watching YouTube videos and Reddit threads, installing different projects from GitHub and playing around with scripts, tools, models and prompts. I know I’ve only just scratched the surface.

Here’s some things I learned in note form, they should act as pointers for more investigation and might need some deciphering.

Getting set up

  • Using Windows 10, GTX 1050Ti with 4GB VRAM.
  • Automatic1111 Stable Diffusion WebUI (https://github.com/AUTOMATIC1111/stable-diffusion-webui)
  • Use Python 3.10.6 on Windows, not the latest version (RTFM)
  • Download models to the \Web UI\models\Stable-diffusion directory
  • My spec was probably as low as you could go and was very slow to render images and had memory issues.

Models

Generating images

A Lizard man headshot. Left: 16 steps, Right: 30 steps. Its subjective but I’d say the image on the left is better, the right image looks more like a mask.
  • CivitAI (and others have galleries with exact prompts, models, settings etc.. but you copy them and they still don’t look as good. What is the issue here?
  • A lot of trial and error will be involved to get things to look the way you want and you need to know what good looks like.
  • When trying to create a more realistic face I asked for pores and wrinkles, but pores added strange cracks to the skin, wrinkles by itself worked much better. Try, try, try again…
A Dune inspired desert woman. Left: pores and wrinkles in prompt. Right: just wrinkles.
  • Notice my headshots always look at the camera. I’d like to get them in different poses.
  • 80–90% of my images were rubbish

Other great resources

These websites and resources kept coming up in my Google searches:

Final thoughts

  1. While people are saying AI is going to take our jobs (normally by people who don’t do those jobs), at the moment anyway, this is a job that still requires skill, experience, creativity and effort. While job roles might shuffle around, like all technological leaps, we always end up creating more jobs than we destroy.
  2. I think I’d need 10 more weekends to get competent with this.
  3. From an automation point of view, which is my background with Shotstack, there is a lot of trial and error required to get images right. Is there a perfect set of settings you can use to create great images, unattended, every time?

To do

  • Image to image
  • Text to video
  • Inpainting/outpainting
  • Embeddings, Loras, Dreambooth and training
  • Creating a custom model/checkpoint
  • Doing everything via command line/scripts/hosted

Note: The rabbit image at the top of this article was generated by Leonardo.ai. This was my attempt:

--

--

Jeff Shillitto
Jeff Shillitto

Written by Jeff Shillitto

Software Engineer and multiple startup founder. Inventor of Shotstack.io. Working on SwellDiary.com. Compulsive tinkerer and part-time indie hacker.

No responses yet