Rice Terrace Ziggurat
The Internet’s full of posts about engineering text prompts for Midjourney and other AI art generators. But there’s almost nothing about how to integrate image prompts into that. And that’s unfortunate, because this it’s a powerful technique to create something novel, as opposed to just rehashing what others have done before. So here I’ll describe the step-by-step process I went through to use that technique to create the below image. I hope you can gain some ideas from it to bring your visions to life when they overstep the bounds of Midjourney’s creativity.
I was stuck at square one with this. Midjourney would give me rice terraces, and it would give me a step pyramid, but not a step pyramid made of rice terraces. Rather, it would draw them side-by-side. And actually I hoped to go further than this. I wanted:
- The shape to be concave, sloping more gently at the bottom and more steeply at the top
- Subtle details to indicate that the pyramid had been carved out of living rock
- A fog-filled void around the pyramid
But I couldn’t even get around to the above 3 points until I figured out how to get the rice terraces on my pyramid.
After going around in circles for an embarrassing length of time, I hit on an idea: Generate one picture just of rice terraces, and another picture just of a step pyramid, and blend them together. But how to make sure the bot wouldn’t draw the two side by side?
The solution is something I’ve only seen mentioned in one other place online so far, but which I think will become really important in synthography. To quote from the article “merge two or more images that have some similar key elements. For example, you can keep the colors and tones of multiple images the same while blending them or combine images where the subjects or environments are somewhat similar.”
I’m going to call these elements linking elements. When the bot begins the process of merging the images, it will fuse that pair of elements as a starting point. Then, taking that initial join as set in stone, it’ll work around that to try and combine the rest of the images.
As a practical example, I generated the following, using the prompt: ziggurat on a clifftop, palm trees, sunrise, aerial by Sony A7R II, Hanging Gardens of Babylon — style raw — s 50
I’d added the keyword “Hanging Gardens of Babylon” to the end, because what I was hoping for was thick patches of vegetation on the terraces. These elements of green vegetation on the flat parts of steps would hopefully have enough in common with rice terraces that, when the time came to make the blend, they would serve as a linking element to make the bot render the rice terraces on the pyramid. To the above end, I upscaled the top right image, as it had the greatest amount of vegetation on the steps.
Next, I needed my rice terraces. Ideally, they would be on a steep slope, so the steepness would function as another linking element in order to railroad Midjourney into putting the rice terraces on the pyramid. I got the best results by inserting “canyon” into the prompt, which was: focus on a pyramidal peak, covered with rice terraces, on a mesa, tropical sunrise, aerial by Sony A7R II — style raw — s 50
Notice how I both prompts specified that I wanted an aerial photograph at sunrise. This was to lay the groundwork for the bot to have an easier time blending the images down the line. Anyway here was the result:
Image #4 was the closest to my final vision, so I upscaled that for use in the blend. Cropped down, it looked like this:
But what about the other elements? Neither of the above images have a fog-filled void, nor the living rock effect. The rice terrace picture might have the concave slope, but I would need to ensure that it was preserved in the final blend.
I killed all those birds with one stone by generating a third image to be blended. For this I wanted a rocky mountain with a concave shape, towering out of fog, with tropical trees to blend better with the other images. I settled on the Matterhorn, stripped of snow. The prompt was was: tropical cloudforest on the Matterhorn, towering out of fog, sunrise, aerial by Sony A7R II — no snow, valley — style raw — s 50
I got weird results, but Image #3 would be good enough to use. I figured the spire would disappear in the blend. I cropped it down to 3:2, which I’d actually done to the other two as well. I didn’t want Midjourney to be confused by blending images of different aspect ratios.
Finally I was ready to start blending. I inputted the above 3 images into the blend function, and got:
Success! All except the last appealed to me, but Image #1 was the most convex, and the dome on the top really made it look like it had been carved from living rock.
Now I needed to think about how to play with the image to get the mood I wanted. I settled on a fisheye lens, a long exposure, chiaroscuro and sepia. What I did was take each of the three prompts discussed previously, and append fisheye, long exposure, chiaroscuro, sepia:: to each of them. So I ended up with:
focus on a pyramidal peak, covered with rice terraces, on a mesa, tropical sunrise, aerial by Sony A7R II, fisheye, long exposure, chiaroscuro, sepia::
ziggurat on a clifftop, palm trees, sunrise, aerial by Sony A7R II, Hanging Gardens of Babylon, fisheye, long exposure, chiaroscuro, sepia::
tropical cloudforest, the Matterhorn, towering out of fog, sunrise, aerial by Sony A7R II, fisheye, long exposure, chiaroscuro, sepia::
thatching, snow, valley::-1 — style raw — s 50 — ar 3:2
The double colon syntax meant that I was basically telling Midjourney to repeat (on its own) the same process as I’d just gone through, but this time to generate all of the images with a fisheye lens, long exposure, chiaroscuro and sepia.
I also inputted my above image as part of the prompt, so the AI would aim to regenerate something like that, instead of going off track. The result was:
Midjourney surprised me by including a moat in half of the pictures! I’m guessing this is for the following reasons:
- Rice terraces sometimes have lakes below them.
- The Hanging Gardens of Babylon have water features in some depictions.
- I’d inputted “chiaroscuro”, and shadows on water are a good way to achieve that.
- My image in the prompt had a small pond to the top right of the pyramid, that the bot could expand around the pyramid.
But the moat wasn’t an unwelcome addition, so I went with Image #2. Here it is again, cropped down:
No matter the genre of synthography, I predict that a valuable skill will be to develop a consciousness of “linking elements” in the disparate things we see around us, enabling us to link them into a cohesive whole and take our creations beyond the real.