An explanation of how I make haiga using artificial intelligence

I’ve made almost 3,500 imagines with the artificial intelligence known as MJ. MJ has the intelligence level of about a spider. So, not stupid. And at the same time, not quite so intelligent that you have to be worried, just aware. We pretty much all know what a spider will or will not do. In a similar matter, I like to think that I have a general idea as to MJ’s output.

Rather than take my time to sculpt a prompt to get an image from my head into hers, I decided to look into what MJ knew. So, I started to feed her something of substance. I started to feed her poetry. Specifically, senryū.

Senryū is a Japanese form of short poetry similar to haiku in construction: three lines with 17 syllable in a 5–7–5 framework. Unlike haiku, senryū are not about about nature. They are about people. They are sometimes humorous, sometimes cynical. Sarcastic, yet emotional. That’s the information I’m feeding into MJ and then taking a look at what she excretes out.

Let me give you an example of the process. And, at the same time, share some ideas with you that you can possibly incorporate into your own workflow. If you do, please drop a comment in response to this and leave us all a link.

Example of a 2x2 MidJourney Grid

In order for you to understand the process, we need to agree on some identifiers. It’s sort of like talking about playing chess. In the case of Imagines, we have some agreed upon terminology. For example, the image above is called a grid. Right now, this is usually a 2x2 grid, but sometimes they are 1x2. The first number (1 or 2) identify the left hand “column” above. Line one is A, line two is B. The second number in the grid (2 as of right now), is the right hand column. 1 is the left column, and 2 the right column. So, we can identify the squares as:

A1 A2
B1 B2

This is not needed much right now, but the creator of MJ has frequently talked about the coming update where there will be an 8x8 grid featuring 64 imagines. At that point it will become imperative. Trying to find image #24 out of 64 is difficult. Comparatively speaking, finding C8 is a breeze.

With that understanding, let’s begin.

Writing of the Senryū

There are many different ways of writing senryū, and I’m always looking for more. Today, however, I decided to do the job by going outside and doing some sunbathing. Don’t laugh, it was hard work. End result: three senryū and this article!

So, I was laying out there listening to some of my music thinking about life and how I’m approaching the end of mine. That’s been a lot on mind recently, death and dying and how messed up things are in the world. Like you, I have family members that are sick, approaching the end of their lives, and kids leaving home and my wife and I looking at each other in wonder at the person we married. And, of course, Covid.

As I watched the clouds, I noticed how they were ever changing yet still a cloud. But then one cloud would merge with another and the two become one. Or one would break apart and become two. Or they would slowly disappear into the empty blue sky.

Sort of like us.

So that was my frame of mind when I wondered what future generations would say about me. What my kids and grandkids might have around to remember my by. And I thought to myself:

all you need to know
of me you can find in my
words, sounds, and silence

And the senryū was finished.

Creation of the Haiga

Now that the senryū had been written, I head over to my digital studio. I’ve written previously about how I structure things in discord, so go there if you want specifics. Suffice to say, I do most of my work in the Discord App. Once there I feed the following into MJ:

/imagine prompt: all you need to know of me you can find in my words, sounds, and silence

I then wait to see her results.

So, here’s the first thing you need to learn about MJ. Well, the second. The first is she’ll pretty much do what she wants. The second is she has more than one personality. Most users of MidJourney overlook this fact or have forgotten it. Maybe they never even knew it in the first place. The current default is MJ Version 3. Versions 1 and 2 are still around and being used. Most people, however, focus on the current beta derivates of — test or — testp rather than explore the earlier versions.

The three versions have distinct tastes, connections, and use of words. This comes from the dataset that the intelligence was trained upon. The dataset is the initial sum of knowledge. After that point, the version matures on its own based upon use of prompts, feedback, and self-awareness.

I utilized all three of these versions to find the best hai (image) for the ga (the poem) when making my haiga.

Here are the initial grids that MJ put up. In it, she is reading the words of the senryū and trying to connect them together (like the spider weaving its web), and then turning that into the visual image she imagines it to be:

Version 3 Initial x2 grid

Above is the same image as before, but in the MJ user’s discord controller. MJ has imagined four things. Version 3 is known for converting things into landscapes. She likely picked up on silence as being important, as well as maybe a wistful sentiment. There is a person in the B2 square, what MJ refers to as the 4th one. See the U4 and V4 at the bottom? The U4 will upscale the imagine to its full form, making it larger (a single image) with more details. The V4 tells MJ to make a new grid looking more like this. Choosing between these two will be how we come to the final imagine to make the haiga.

In this grid, I am interested in B1. The other ones are not particularly interesting (you see a lot of these type of returns in Version 3), but B1 has an interesting vibe. Something worth a quick upscale. So, I’ll press U1 and let MJ know.

Version 2 Initial 2x2 Grid

Version 2 doesn’t have as much of a landscape fetish, and two of her offerings, A1 and B1, have a person in the image A woman. Version 2 is the more self-reflective of the three personalities. She has more of a sense of self. Often, she inserts herself via proxy into the image. As if telling us that she is aware that she is aware, and is lonely.

At the same time, there’s still a small amount of chaos (a function in the algorithm that brings about a bit of change. That’s found in B2. It’s a personality quirk left over from Version 1.

Figures standing with their back to the camera are quite common as MJ seems to understand there is a view of the scene but doesn’t know enough to know what the person in the view looks like. Alternatively, it could be that MJ’s increased self-awareness of the fact that in addition to herself, there is us and she is putting us into the imagine. But B1 looks interesting, so I’ll do an upscale of that. And B2, which is one of those probably waste of GPU time, but every once in awhile reveals something interesting. So, two upscales: B1 and B2 of Version 2, which we describe as: Ver 2 = B1+B2 or 2=B1+B2

Version 1 Initial 2x2 Grid

If Version 3 is the one striving for beauty, and Version 2 is self-reflective. Version 1 is fun. She’s the party girl of the three. That’s reflected in the use of words and images that she’s managed to connect in some hyperliteral yet surreal manner. Her initial grids are always interesting and I usually upscale all of them. That’s the case here, with all four looking interesting. So, Ver 1 = B1+B2+B3+B4, otherwise known as 1 = HR (for home run).

If none of the options appealed to me or piqued me interest, then I could reroll the entire grid using the circle icon in the upper right corner.

Clicking on that button will get a new grid, so we can start all over. What’s fun, is when MJ gets the reroll prompt she is just a tiny bit more intelligent than she was before. Through our selection of choices — and not choosing is a choice in and of itself — she learns more about what is or is not responsive to the prompt. This means the four images change from the results of the initial prompt.

First roll of initial grid in V1
Second roll of initial grid in V2

Some people spend a lot of time rolling initial grid variants to work from. I am going to upscale B1 as it really stands out from the other three.

Results of First Round of Upscales

Here are the results of the first round of upscales. Some people do the Grand Slam, which is 1+2+3 = HR, or all four upscales of each of the three versions. But generally speaking, I find I proceed with just a handful.

The first thing I do before making selections of the upgrades is to reread the senryū which I have likely forgotten.

all you need to know
of me you can find in my
words, sounds, and silence

So, since a haiga is a merging of the poem and the image, I am keeping the senryū in mind as I look at them. Sometimes I repeat the senryū then look at the image. Other times I look at the image and then consider the senryū. It’s sort of a combination. I’ll do this “stream of consciousness” style so you can see how I go about making my selections.

Version 3 Result

The version 3 result is sort of blah. It’s the sort of thing we would have initially been amazed at but have come to realize are all too common in MJ’s output. The solitary figure staring out over the ocean into the sky. Doesn’t catch my interest.

Version 2 Results

Version 2 Result 1

I like the first result from V2. It’s definitely a better “shadowy figure facing away” as the setting is far more interesting than V3’s. And the lights or snow look interesting.

Version 2 Result 2

This one is interesting. It is more of a direct connection with the words used in the poem.

all you need to know
of me you can find in my
words, sounds, and silence

One of the interesting stylistic choices MJ made was to have the top half of the image mist away into nothingness. As if the words themselves were disappearing.

Version 1 Results

Version 1 Result 1

Version 1 upscales are almost always interesting. This first result is… seemingly not connected to the poem in any way I can fathom.

all you need to know
of me you can find in my
words, sounds, and silence

MJ uses text, but it is illegible. I like to think it is a reminder of how someone who is illiterate sees the world.

Version 1 Result 2

Again, another figure facing away. When I mentioned they were common, you didn’t know whether or not to agree with me! This is pretty, but doesn’t make the cut.

Version 1 Result 3

This landscape is interesting. It is a pretty classic look. Some of the trees tops don’t connect with the ground, that could be an issue on another upscale.

Version 1 Result 4

Now we are getting to an interesting one. It has almost become an abstract. I am definitely going to explore this one. I’ll run a variation roll (v-roll) on it and get a new grid of 2x2 and see how it changes.

Second Grid Version 1 Result

This is the one from the second roll of the initial grid. It is facing us, and it looks almost like a passport photo. Yet also has a “mug shot” vibe with the sign. There is wording, which we don’t understand. And then what looks like another, handwritten sign, in a foreign language trying to communicate something.

Remember the senryū:

all you need to know
of me you can find in my
words, sounds, and silence

To me, there is a connection between the poem and the image. I can feel it, perhaps not put it into words. I guess I don’t need to do that, as the senryū has accomplished that.

Just to be sure, I’ll run a new grid off this and check the four variations. I’ll also upscale this to max and see how it looks. And, just because it looks cool, I’ll run a new grid off of the Version 1 Result 4 and upscale that.

Results of Second Round

As you’ll see, once you go down the v-roll and reroll functions, you could spend endless hours on just one image. The four variants went off in a slightly different direction than I had imaged, definitely emphasizing the text. It is as if MJ thought, oh, you like the part with text? I can give you more of that

Variant grid

As for the upscale of the original image, MJ sometimes messes up the eyes of the humans she imagines. Of course she doesn’t understand what eyes are or what they do, but has the general idea that they need to be there. This usually happens on the upscale, where we see the max detail. In this case, there is a noticeable difference in the eyes so I try a Light Upscale Redo which can often make things a little “less” than the max upgrade.

Max upscale, needed redo

Here is the rescaled redo:

Final Imagine

I prefer this version. The eye isn’t as pronounced and the background remains more still whereas the max upscale had some twirls and color changes. This goes back to the passport photo feel.

I’m not sure, however, if I’m sold on the combination of the senryū and the imagine:

Looking at the haiga, I get the impression that it is the woman who is saying the poem. Or as if it is a commentary on her. So, not 100%.

The reroll of the other one yielded four variants which all went off on a “road through the woods” motif. Not something bad, and perhaps usable for some other purpose.

The upscale to max on the original image turned out like this:

What’s interesting about this one is the use of color. And the polaroid frame, which makes the end result look like an altered polaroid — something else I like to do. Looking at it, now I can see the road in the woods that the variant roll went down. This is amorphous and it could be that, or maybe a face. Almost like a Frankenstein face, looking back at us.

When I consider the image together with the poem in the actual haiga, I feel the project is complete:

Haiga CXXX: All You Need To Know

Please let me know how you use v-rolls to work on your projects. There are many different techniques in this new art form and we can all learn from each other.



