It’a been a long time since I was motivated to blog — hello!
I’ve found myself so utterly compelled by AI text-to-image generators over the last few months that I may be starting to bore my partner, friends, colleagues and literally anyone that will listen. Hopefully Medium will generate more self-selecting audience for my ramblings!
I’m going to assume you know vaguely what I mean when I talk about AI text-to-image generators. There are a bunch out there; I started with Latent Diffusion and I see MidJourney popping up a lot too. Last week I finally got access to DALLE by OpenAI, and I’ve been hooked ever since.
The image at the top of this article is one I made the first night I had access to DALLE. In the rest of this post, I’ll focus on a different image. I’ll also going to skip over the bit where I explain in detail every little dalliance, and focus on something approaching a creative workflow.
I’ll be the first to say I have no real artistic talent when it comes to drawing, painting etc. To me, those humans that can create basically any two dimensional art are fantastically talented and I wish I had the time to devote to being even remotely competent.
I think this is a large part of the reason I’ve found DALLE so deeply fascinating. It arguably bypasses the need for me to learn how to illustrate and lets me create using a tool I feel much more capable with: words.
Step 1: The Spark
I wanted to create a fun ice breaker for the weekly team meeting, so building off an idea I played with on Twitter, I started feeding DALLE song lyrics. The idea was simple: could my team guess the song title by looking at the generated images?
One extract gave some particularly odd results:
The seed text?
When I’m not with you I lose my mind, Give me a sign, Hit me baby one more time, digital art
NB: I added “digital art” at the end of every prompt to try and give a consistent visual style across the quiz.
The fourth image really stood out to me:
I found myself asking questions about the composition:
- Why are her eyes closed?
- Why is she wearing headphones?
- What is she holding against her face?
In the context of an AI image generator, all these questions have no “true” answer, because there is no reasoning. DALLE doesn’t “think”, so there was never a moment where it creatively reason ed through any of these decisions. This in and of itself is fascinating, but again, this isn’t what this blog is about.
I liked the image, and decided I would take it further. We’ll refer to the person in this image as “Girl Alpha” from now on.
Step 2: Ideate and iterate
DALLE allows you to remove parts of an image and ask it to fill them in based on a prompt. So I decided to remove the words on the left of the image and change the prompt to something more literal. I wanted to solidify the mystery of the thing touching her face first, and posited that it might be a feather
Once the feathers were added, it was clear to me that the thing in her hand was NOT a feather and I wasn’t going to gain anything by going any further down this route.
So I went back to the root and channeled Britney. Maybe this girl is a singer? I removed her hand and the not-feather, and beefed out the quote:
Definitely better! But it still felt a bit too subtle to me, and the “huge crowd” element wasn’t quite coming through. I kind of liked image #5, especially the more studio-style microphone. So I went again, getting rid of the crowd but keeping the idea of a large, dark space:
Hmmmm! This wasn’t what I was envisaging at all. The concept of a microphone dangling from the ceiling really wasn’t one DALLE grasped. But maybe I just had the language wrong? After a quick google to look for proper microphone parlance, I tried again:
Now it was starting to evolve into something more interesting. I tried a few more microphone types (old fashioned, vocal and unidyne) with varying success. But eventually, I settled on this condenser-style mic as I liked the over-the-top nature of it:
Step 3: Variations
DALLE also allows you to pick an image you like and quickly make variations — images in a similar style and composition, but ultimately different. I felt like I had become quite attached to this particular image, so forcing me away from it might open up some other paths:
I especially liked the second image, as it really feels like she is singing, and it feels like the concept of an audience is being brought back into the composition:
Feeling like we were on a roll, I asked for variations on Girl Beta:
But I couldn’t let go of that first image, so I tried one more set of variations:
Step 4: Decide
It felt like I was becoming a little trigger happy with the variation button, so I had to decide on one or two to take further. In case the foreshadowing wasn’t enough, I decided to go with Girl Alpha and Girl Beta:
Step 5: UNcropping
Now comes the REAL fun. I was fortunate to meet Lewis Hackett (director of Prefix Studios) through some online lockdown events centred around our shared love of Virtual Reality. It was Lewis who first showed me the potential of AI art, and he’s been generous with his time in helping me get up to speed.
One neat method he passed on is “Uncropping”. It’s easier to explain in steps:
- Take an image (AI generated or otherwise) and shift in any direction (using Photoshop or similar). Here I’ve moved it 50% (512 pixels) to the left.
I also used Photoshop’s “content aware” feature to remove the DALLE watermark from the bottom corner (you’ll see why eventually).
2. Next, upload the shifted image to DALLE and instruct it to fill in the transparent area. You can also add a text prompt to guide it, though at this stage we’re really just looking for the back of Alpha’s head.
Hitting “generate” gives 5 “inpainted” versions of the same image. Let’s look at the results:
My mistake was forgetting that DALLE has no concept of the microphone that has moved “out of shot”. The solution was to change the prompt to reference just the part of the image we are currently generating. So I ran it again, this time with the “A singing girl wearing headphones, standing in a darkened theatre”
Now it felt like we were getting somewhere. Something about the size of her head and neck felt a little off though, so I picked my favourite, erased the back of her head and asked for more inpainted versions.
Eventually, I decided on this full headshot of Alpha:
Obviously, she’s still not perfect. Her neck still kind of vanishes and should have more of a nape. There’s an imperfection on her lips and the neon hair underneath her left ear is odd. But I just didn’t care, because of how much of her was so expressive.
Of course, I didn’t stop at the back of her head — I wanted to fill out the scene just a little more. I imagined she might be sat at, or holding a musical instrument. So of course, I extended and played with that idea:
I tried a guitar, which felt pretty good! I worried about where the fretboard would go though.
The drums felt like they would fit into the scene better, so I went with them to fill out the whole picture.
And to cut an already long story a little shorter, I carried out a few more extensions, and pieced the final composite together:
I am sure I could have taken her much further, with more time and tweaking. Honestly, though the drum kit did fill the scene a little that’s where I started to run out of steam.
What I think is also really interesting is seeing the iterations over time that put her together:
To be clear, the following includes only the nine images that form a direct path to the final output. There were 151 other images I ultimately discarded (including Girl Beta).
What about Girl Beta?
I know! I really wanted to develop her too. I had designs on trying to bring them together as a duo (though I think ultimately the styles might have been a little too divergent).
Unfortunately, I ran out of time before DALLE went into Software Beta (ironic), which limits the number of iterations I can run without paying. I don’t begrudge this — I think OpenAI have made something really special and ultimately usable, and it was only a matter of time before they would need to monetise it. It’s highly likely I will pay to use DALLE at some point, when I have a specific project in mind.
Summary
I’m aware that this has already been a very unstructured post that I just wanted to get out because I was enjoying the process so much and wanted to share it. I could ramble on for ages.
The original clickbait title I wrote for this blog was: “AI image generation with DALLE will change workflows forever”. Having got to a natural endpoint with my own exploration, I still believe this is true, especially when it comes to DALLE’s ability to generate variations.
Some top-of-the-head use cases for DALLE in a workflow:
- Storyboarding a scene
- Creating characters
- Stock photos (at least without famous people in them)
I still haven’t quite figured out how it might help me in my day job, but I’m genuinely excited to see where generative AI tools go in the near future. More than that, I’m looking forward to seeing more folk using tools like this to explore their creativity.