AI image generation with DALLE will change workflows forever

Rob Scott
9 min readJul 21, 2022
A digital illustration of a dark red, classic American muscle car parked on a ridge overlooking a city at night. No-one is sitting in the car, and in the background you can see lines of lights cutting through the hills.
My first experiment with “UNcropping”, created with DALLE.

It’a been a long time since I was motivated to blog — hello!

I’ve found myself so utterly compelled by AI text-to-image generators over the last few months that I may be starting to bore my partner, friends, colleagues and literally anyone that will listen. Hopefully Medium will generate more self-selecting audience for my ramblings!

I’m going to assume you know vaguely what I mean when I talk about AI text-to-image generators. There are a bunch out there; I started with Latent Diffusion and I see MidJourney popping up a lot too. Last week I finally got access to DALLE by OpenAI, and I’ve been hooked ever since.

The image at the top of this article is one I made the first night I had access to DALLE. In the rest of this post, I’ll focus on a different image. I’ll also going to skip over the bit where I explain in detail every little dalliance, and focus on something approaching a creative workflow.

I’ll be the first to say I have no real artistic talent when it comes to drawing, painting etc. To me, those humans that can create basically any two dimensional art are fantastically talented and I wish I had the time to devote to being even remotely competent.

I think this is a large part of the reason I’ve found DALLE so deeply fascinating. It arguably bypasses the need for me to learn how to illustrate and lets me create using a tool I feel much more capable with: words.

Step 1: The Spark

I wanted to create a fun ice breaker for the weekly team meeting, so building off an idea I played with on Twitter, I started feeding DALLE song lyrics. The idea was simple: could my team guess the song title by looking at the generated images?

One extract gave some particularly odd results:

The seed text?

When I’m not with you I lose my mind, Give me a sign, Hit me baby one more time, digital art

NB: I added “digital art” at the end of every prompt to try and give a consistent visual style across the quiz.

The fourth image really stood out to me:

A girl in profile on the right of the image. She has long hair, with a few strands waving in front of her face. Her eyes are closed and her mouth is slightly open. She is wearing headphones. She is touching her left cheek with something that looks like a hair. There is nonsense text overlayed on the left of the image.

I found myself asking questions about the composition:

  1. Why are her eyes closed?
  2. Why is she wearing headphones?
  3. What is she holding against her face?

In the context of an AI image generator, all these questions have no “true” answer, because there is no reasoning. DALLE doesn’t “think”, so there was never a moment where it creatively reason ed through any of these decisions. This in and of itself is fascinating, but again, this isn’t what this blog is about.

I liked the image, and decided I would take it further. We’ll refer to the person in this image as “Girl Alpha” from now on.

Step 2: Ideate and iterate

DALLE allows you to remove parts of an image and ask it to fill them in based on a prompt. So I decided to remove the words on the left of the image and change the prompt to something more literal. I wanted to solidify the mystery of the thing touching her face first, and posited that it might be a feather

5 variations on girl alpha. The text is gone and there are feathers floating in front of her where the text used to be.
Prompt: “A girl holding a feather to her face, digital art”

Once the feathers were added, it was clear to me that the thing in her hand was NOT a feather and I wasn’t going to gain anything by going any further down this route.

So I went back to the root and channeled Britney. Maybe this girl is a singer? I removed her hand and the not-feather, and beefed out the quote:

A series of 5 images of girl alpha, this time with a microphone in different positions. Sometimes she is holding it, and sometimes it is just creeping into shot. In the 5th image, the microphone has a more “studio” quality to it and is high above her face in the top left corner.
Prompt: “A girl holding a microphone in front of her face, as she sings before a huge crowd in a darkened theatre”

Definitely better! But it still felt a bit too subtle to me, and the “huge crowd” element wasn’t quite coming through. I kind of liked image #5, especially the more studio-style microphone. So I went again, getting rid of the crowd but keeping the idea of a large, dark space:

5 more images of girl alpha, this time with vague items touching and dangling down from the ceiling, but none of them really looking like a microphone.
Prompt: “A girl singing into a studio mic dangling from the ceiling in a darkened theatre”

Hmmmm! This wasn’t what I was envisaging at all. The concept of a microphone dangling from the ceiling really wasn’t one DALLE grasped. But maybe I just had the language wrong? After a quick google to look for proper microphone parlance, I tried again:

Another 5 images of girl alpha, this time with slightly different styles of microphone hanging down from the top of the image in front of her face.
Prompt: “A girl singing into a studio microphone hanging in front of her in a darkened theatre”

Now it was starting to evolve into something more interesting. I tried a few more microphone types (old fashioned, vocal and unidyne) with varying success. But eventually, I settled on this condenser-style mic as I liked the over-the-top nature of it:

Girl alpha, eyes closed, headphones on, hair moving in front of her face, with her lips slightly parted, singing into a large condenser style microphone rig hanging in front of her face.

Step 3: Variations

DALLE also allows you to pick an image you like and quickly make variations — images in a similar style and composition, but ultimately different. I felt like I had become quite attached to this particular image, so forcing me away from it might open up some other paths:

5 variations on girl alpha in profile, each with slightly different hair. In every variation, the girl’s head is leaning backwards to varying degrees. The mouth also ranges from closed, to slightly open, to clearly open. Three are on the right of the image, two on the left. The microphones are all large and of condenser types, but vary quite significantly in visual style and orientation.

I especially liked the second image, as it really feels like she is singing, and it feels like the concept of an audience is being brought back into the composition:

A variant of girl alpha. She is in profile on the right of the image, headphones on, long hair down the side of her face, eyes closed and mouth clearly open. She is singing into a large, black condenser microphone just in front of her face. There is the hint of a crowd in the background. We’re calling her “girl beta”.
We’re going to call this one “Girl Beta”.

Feeling like we were on a roll, I asked for variations on Girl Beta:

A series of 5 images, all variations of girl beta. They are different enough that the variant girls feel like different characters now, as their hair varies in style and colour. The backgrounds are also more dynamic and different, and there is also variety in the style of microphone. The only common facet across all 5 images is that her eyes are closed and her mouth is open, singing into the microphone.
It was at this point I realised we probably some kind of forking/versioning system to keep track…

But I couldn’t let go of that first image, so I tried one more set of variations:

5 more variants on girl alpha. This time the mouth is mostly closed again. The background changes from picture to picture and so does the microphone, but they all look to be generally the same girl.

Step 4: Decide

It felt like I was becoming a little trigger happy with the variation button, so I had to decide on one or two to take further. In case the foreshadowing wasn’t enough, I decided to go with Girl Alpha and Girl Beta:

A composite of two images, girl alpha on the left and girl beta on the right. The major difference is that girl beta is more expressive, with her mouth open wide, whereas girl alpha seems to be moving, as a few long locks of hair dance in front of her face. Both are wearing headphones.
Girl Alpha on the left, and Girl Beta on the right

Step 5: UNcropping

Now comes the REAL fun. I was fortunate to meet Lewis Hackett (director of Prefix Studios) through some online lockdown events centred around our shared love of Virtual Reality. It was Lewis who first showed me the potential of AI art, and he’s been generous with his time in helping me get up to speed.

One neat method he passed on is “Uncropping”. It’s easier to explain in steps:

  1. Take an image (AI generated or otherwise) and shift in any direction (using Photoshop or similar). Here I’ve moved it 50% (512 pixels) to the left.
A screenshot of a 1024x1024 pixel image of girl alpha in photoshop. The image has been shifted 512 pixels to the left, leaving a transparent area that is 512 pixels wide and 1024 pixels high behind girl alpha. The image cuts off just above her fringe, so the back of her head is not part of the image (and at this point, has never been generated)

I also used Photoshop’s “content aware” feature to remove the DALLE watermark from the bottom corner (you’ll see why eventually).

2. Next, upload the shifted image to DALLE and instruct it to fill in the transparent area. You can also add a text prompt to guide it, though at this stage we’re really just looking for the back of Alpha’s head.

A screengrab from the DALLE tool, with Alpha’s face moved to the right. The 50% of space behind her head is transparent and ready for inpainting by DALLE. Above the image, the text prompt has been set to “A girl singing into a studio microphone hanging in front of her in a darkened theatre”. Beside the prompt is a button labelled “generate”.
Screenshot of the DALLE tool for editing/inpainting images.

Hitting “generate” gives 5 “inpainted” versions of the same image. Let’s look at the results:

The original 50% image with 5 new inpainted versions. Each of the images has a microphone right next to Alpha’s head, which now has a back and more hair.

My mistake was forgetting that DALLE has no concept of the microphone that has moved “out of shot”. The solution was to change the prompt to reference just the part of the image we are currently generating. So I ran it again, this time with the “A singing girl wearing headphones, standing in a darkened theatre”

The original 50% image again, with 5 new inpainted variants. This time the variance is in the size of Alpha’s head, how much hair she has, and slight variations in headphones.

Now it felt like we were getting somewhere. Something about the size of her head and neck felt a little off though, so I picked my favourite, erased the back of her head and asked for more inpainted versions.

Eventually, I decided on this full headshot of Alpha:

A full extended picture of alpha, only now we can see the back of her head. Her hair is cut in a messy bob and ends above her shoulders. The back of her head is tinted by a neon blue light.
Honestly, I thought she would have longer hair.

Obviously, she’s still not perfect. Her neck still kind of vanishes and should have more of a nape. There’s an imperfection on her lips and the neon hair underneath her left ear is odd. But I just didn’t care, because of how much of her was so expressive.

Of course, I didn’t stop at the back of her head — I wanted to fill out the scene just a little more. I imagined she might be sat at, or holding a musical instrument. So of course, I extended and played with that idea:

An extension to the bottom of the image. Just creeping into shot from the bottom is what looks like an acoustic guitar, with the fretboard passing up under Alpha’s left arm.

I tried a guitar, which felt pretty good! I worried about where the fretboard would go though.

The drums felt like they would fit into the scene better, so I went with them to fill out the whole picture.

And to cut an already long story a little shorter, I carried out a few more extensions, and pieced the final composite together:

I am sure I could have taken her much further, with more time and tweaking. Honestly, though the drum kit did fill the scene a little that’s where I started to run out of steam.

What I think is also really interesting is seeing the iterations over time that put her together:

An animated gif showing each individual image appear in turn to make up the composite. The first image appears at the top in the middle and goes through two iterations, before the rest of the image is filled in in a clockwise direction.
Notice I left the DALLE signature in the bottom right of the final composite.

To be clear, the following includes only the nine images that form a direct path to the final output. There were 151 other images I ultimately discarded (including Girl Beta).

What about Girl Beta?

I know! I really wanted to develop her too. I had designs on trying to bring them together as a duo (though I think ultimately the styles might have been a little too divergent).

Unfortunately, I ran out of time before DALLE went into Software Beta (ironic), which limits the number of iterations I can run without paying. I don’t begrudge this — I think OpenAI have made something really special and ultimately usable, and it was only a matter of time before they would need to monetise it. It’s highly likely I will pay to use DALLE at some point, when I have a specific project in mind.

Summary

I’m aware that this has already been a very unstructured post that I just wanted to get out because I was enjoying the process so much and wanted to share it. I could ramble on for ages.

The original clickbait title I wrote for this blog was: “AI image generation with DALLE will change workflows forever”. Having got to a natural endpoint with my own exploration, I still believe this is true, especially when it comes to DALLE’s ability to generate variations.

Some top-of-the-head use cases for DALLE in a workflow:

  • Storyboarding a scene
  • Creating characters
  • Stock photos (at least without famous people in them)

I still haven’t quite figured out how it might help me in my day job, but I’m genuinely excited to see where generative AI tools go in the near future. More than that, I’m looking forward to seeing more folk using tools like this to explore their creativity.

--

--

Rob Scott

@BBC UX Architect, IA Practitioner, Tech & xR enthusiast, @VRManchester co-organiser, Ex-Summer Camp Counselor and UK Labour Party Member. Views are my own.