The Roll Monitor

Alex Tully
4 min readAug 5, 2023

--

After you’ve been using AI art generators for a while, you’ll notice that it’s often tricky to get more complicated compositions. This happened to me recently when I wanted Midjourney to smudge black paint on the lips of a high school student holding a clipboard. When I tried prompting with only text, the paint kept appearing in the wrong places in the image. I’d actually given up on this project, but after my success using the /blend function to make Rice Terrace Ziggurat, I began to wonder if I could do something similar here i.e. decomposing my image into smaller part, then generating each such part as a separate sub-image, and finally blending them.

I decided to start with the hardest part: How could I get the smudge of paint around the lips? Midjourney was struggling with this, so I wanted to avoid potential distractions in the prompt for this portion of the image. My prompt for this was: closeup of ring of black_paint smudged surrounding lips, red_lipstick

EDITED TO ADD: Please see this post for why I used underscores between some of the words in the above prompt.

3/4 of the results looked awful, with the paint dripping like chocolate sauce. But one of them was good enough:

Then I put everything else I wanted in the prompt for the second image, which was: upper_body_shot, Australian teenage girl, holding a clipboard, in school uniform with white polo_shirt, glazed_eyes, dilated_pupils, expressionless, thick_makeup, chain_link fence, ashpalt, schoolyard, film_noir, harsh_lighting, harsh_shadows, chiaroscuro, tinted monochrome, A7R II — s 50 — style raw — no tie, schoolbag

The closest match to my vision was:

There was good and bad news when I blended them. The good news was that the blend yielded many images with the paint in the right place. The bad news was that all of them not only got rid of the clipboard, but changed the clothes on my subject!

Clearly the bot was going to need more guidance. I solved the problem with the following multi-prompt (for easy reference, I’ve split it into a list, where each element has a separate numerical item):

  1. <URL to Image of Girl>
  2. <URL to Image of Paint on Lips>
  3. upper_body_shot, Australian teenage girl, holding a clipboard, in school uniform with white polo_shirt, glazed_eyes, dilated_pupils, expressionless, thick_makeup, chain_link fence, asphalt, schoolyard, film_noir, harsh_lighting, harsh_shadows, chiaroscuro, tinted monochrome, A7R II::
  4. closeup of ring of black_paint smudged surrounding lips, red_lipstick::
  5. white polo shirt, school uniform, clipboard::
  6. tie, schoolbag::-1
  7. — s 50 — style raw

#3 and #4 are the same prompts that I already used to generate the images of the girl and the paint, respectively. The important element is #5. This is a list of everything that Midjourney was leaving out from the blended image.

And it gave me what I wanted!

Going from specifics to generalities: If you’ve got some item A that Midjourney’s failing to include in its output, then you can follow the below procedure:

  • Generate a separate image that is just item A and another item B that’s easy for Midjourney to generate both with A, and with the rest of the image. In the above case item A was the paint, and item B was the lips. I’ll call this image the “micro-image”.
  • Generate another “macro-image” that includes the other elements of the composition besides item A. Depending on what item B is, you may need to explicitly tell Midjourney to include it in the macro-image. Or Midjourney may include item B by default because of the nature of the macro-image. The latter case occurred with my image in question, because if you generate an upper body shot of a person then it will naturally include lips.
  • Use the /blend function to combine the above two images.
  • Then use your blended image as an image prompt, together with the text prompts for the micro-image and the macro-image, all together as one massive prompt. Those two text prompts must be separated by double colons :: .
  • In the above two stages, it may be the case that Midjourney omits one or more crucial items. Let’s call these C (for me C was the clipboard and the clothes). In that case, redo the above stage, but include C again as a third text element in the prompt. Use double colons :: to separate this sub-prompt from the text prompts for the micro image and the macro image prompts. For my image in question, this third prompt was white polo shirt, school uniform, clipboard:: (i.e. #5 on the numerical list above).
  • Bang it all in and fingers crossed it works!

Happy prompting!

--

--

Alex Tully

Into Generative AI, but 100% Human-Written Blog (every word)・Bachelor’s in Maths・Master’s in Linguistics (@ANU 🇦🇺 )・Taught myself 🇯🇵 and 🇹🇭・Digital Nomad