A Tale of Two AIs (Part one)

Brad Sims
7 min readJul 18, 2023

--

I stumbled upon a strange thing this morning. Looking back through my earliest toying with Midjourney, I decided to compare some outputs from my early prompts made with version 4 to outputs using the same prompts in version 5.2. Early on, I’d compiled a tarot deck in a minimalist style. I was pretty happy with the original results. So, I cut and pasted the old prompts into Midjourney v5.2, added all the different card names as prompt permutations, and very quickly got new versions of the cards as imagined by the newer engine.

On a whim, I decided to run the same set, but through the niji algorithm model. It’s important to note that the niji model is entirely separate from the standard Midjourney model. So, while comparing prompt outputs from v4 to v5.2 led to very similar results, the outputs from the niji model were wildly different, because niji is a completely different beast.

If you’re not familiar with what Niji is, how it works, and how it’s different from the standard Midjouney engine, PC Guide offers this helpful description:

Niji, Japanese for Rainbow, is an entirely separate algorithm offered by Midjourney, an anime-specific model alongside their default algorithm lineup of version 1–5.2. The difference between the different versions of Midjourney, is that v1–5.2 are general purpose algorithms, trained on a diverse dataset of visual media from vintage film photographs to digital illustrations. These are flagship models, designed to allow the broadest audience possible to create any image they want. Alongside that offering, the Niji parameter is a more specialised and experimental algorithm, trained on a vast knowledge of anime, ideal for anime fans who want to create digital art with anime aesthetics, especially character-focused compositions.

When I ran the promts through the niji engine, I noticed an interesting thing about the outputs for The Lovers card. Both version 4 and version 5.2 of the standard Midjourney models depicted a young, typically white male and female in some sort of romantic tableau: totally as expected based on traditional tarot styles.

But in the niji outputs, there was something different.

Prompt: The Lovers tarot card — ar 2:3 — niji — style expressive

Do you see it? Three of the four images are same-sex couples (all depicting two females).

I rerolled the same niji prompt twice more for a total of 12 image outputs: six of them were of two females, six were male-female couples. That’s a 1:1 ratio of same-sex couples to opposite-sex couples.

Here are all three sets of the niji samples side-by-side for comparison.

Prompt: The Lovers tarot card — ar 2:3 — niji — style expressive

Of course I had to reroll them an equal number of times using the non-niji model of both Midjourney v4 and v5.2 to compare. Almost all the outputs were of heterosexual couples. Here are all three from version 4.

Prompt: The Lovers tarot card, neo — minimalistic and clean design poster — ar 2:3 — v 4

Only two out of 12 appear to maybe, possibly show two females (first image, bottom left; second image, top right). I’ll leave that up to interpretation.

In the sets created using v5.2, only one out of the 12 images is arguably same-sex. Maybe. See if you can tell which one below.

Prompt: The Lovers tarot card, neo — minimalistic and clean design poster — ar 2:3

Just for comparison, I also ran the base prompt “The Lovers” without any of the “minimal, tarot card” verbiage through both the standard version 5.2 engine and through the niji engine. The results were (arguably) 4:8 same-sex couples in the niji samples, and 100% heterosexual (and heavily “romance novel cover”) for the non-niji samples.

The non-niji outputs:

Prompt: The Lovers tarot card — ar 2:3 — s 750

The niji outputs:

Prompt: The Lovers — ar 2:3 — niji — style expressive

(Side Note: I had originally included the extra style prompt for “expressive” on the niji tarot examples, because it generally gives a nicer outcome. Since I used it on the original niji tarot set, I included it in all the subsequent prompts for consistency in examples from that engine. I do not believe the “expressive” style had any effect on the gender on the cards. I might be wrong. If you think otherwise, I’d love to hear your reasoning and insight.)

Admittedly, this is a small sample. This is obviously not a scientific survey.

So, what’s going on here? I’ve got some ideas about what I think is happening, and to some degree what it means in the bigger picture of AI data modeling.

It could simply be that Midjourney just doesn’t know the difference between two different elements in the same image. I’ve seen a similar phenomenon when I’ve tried to get the standard Midjourney engine to create illustrations of St. George slaying the dragon. Midjourney seems to have trouble separating the concept of “dragon” from that of “horse.” However, I don’t think this is what’s going on in the case of The Lovers tarot examples, because we don’t see that confusion in the form of the same ratio of same-sex couples in the non-niji samples. If it were simply a case of confusing the elements of the image, we’d see it in a lot more images depicting couples from the standard Midjourney engine.

It might also be that there’s simply more similarity between how genders are depicted in anime/manga. Strong female characters are often confident and imposing, with broad shoulders and strong-willed expressions. Male characters are elegant and haughty, and often sport flowing hair, ornamental clothing, and soft features. Both character types are present in the niji examples of The Lovers cards. When added to the idea above that Midjourney might simply be confused by the two visual elements, this idea has a bit more merit.

But there might be a third reason for the difference: one with more far-reaching implications to the way Midjourney learns and the ways in which we use it.

It could be that the reason we have different outcomes is that the two different Midjourney engines are learning both from the original training images AND from the collective prompts of the user community. The images from the default engine skew heteronormative, which I propose generally matches the wider base of users who aren’t necessarily niji/manga/anime enthusiasts. However, the niji model users are, very generally speaking, more non-gender conforming and open to more non-traditional, non-heteronormative depictions of relationships, at least in the images and fictions they seek out.

Anecdotally, I feel like we see A LOT of heteronormative “male gaze” in images in Midjourney’s explore feed and on the public Discord channels, particularly in the non-niji outputs. I read that observation often from others in critiques of the collective outputs from Midjourney users. In short, there’s an overwhelming trend among default Midjourney images toward fantastically over-beautified, often hyper-sexualized women, along with badass, hyper-masculine depictions of stereotypically male images (sports cars, fantasy warriors, etc). Sure, there are also lots of outputs of puppies, logos, flowers, landscapes, attempts at selfie avatars, etc. But it’s hard to deny the tendency toward male-fantasy imagery. Which make sense: the majority of Midjourney users fall into a male demographic that just finds that sort of imagery interesting, inspiring and fulfilling.

My conclusion here (based on my vaguely scientific and laughably limited “experiment”) is that user input does seem to have an effect on broader thematic content of collective outputs in Midjourney. From that, the big takeaway is, as many people have suggested, that the community of Midjourney users has a huge influence over the essential character of the AI model. The soul of the machine, as it were.

It’s overly reductive to simply say “garbage in/garbage out” in criticism of the male-centered view of images in the community. But my tests with “The Lovers” imagery tells me that if we want better, more accurate, more inclusive, and more realistic AI models, we’ve got to be more intentional with what data we use to build the system.

BIG NOTE: I want to stress that this piece is NOT in any way a judgment of the values or lifestyles of users of either of the two Midjourney engines, nor of people and the images that they enjoy. I fully realize that this kind of inquiry has a tendency to “call out” one group or another and to stereotype. That is not my intention in these observations. Please, go out and make whatever images, narratives, and depictions of your imagination that bring you joy and fulfillment. Love is Love. I hope you find a way to share your love, your vision, and your kindness with the world. ❤

--

--

Brad Sims

Nature and landscape photographer living in Little Rock, Arkansas.