Image generation AIs

These days AI can be used to create art. The quality can be unbelievable if you know how to use the tools effectively.

Nat Sothanaphan

Published in

ABACUS digital

10 min readApr 5, 2023

E.g. here is an example from NovelAI, a popular anime image generation service:

novelai-chan, ai art, painter, goose cap
NAI Diffusion Anime (Full)
512x768
{“steps”: 28, “sampler”: “plms”, “seed”: 724761672, “strength”: 0.69, “noise”: 0.667, “scale”: 8.0, “uc”: “nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, panels, multiple people”, “sm”: false, “sm_dyn”: false}

Cute, isn’t it? Btw, the caption is the extracted metadata, for reference (my own custom script). It is known that AIs still have trouble with certain things such as correct anatomy, especially hands. Also some real-world common sense, such as ramen shouldn’t be eaten by hand. (Google it.) In general, these images usually still have the “AI-generated” look. But as we can see, an experienced user can sometimes bypass it and generate something that’s very difficult to distinguish from human-drawn art. Even if not, certain techniques such as inpainting can be used to improve upon a generated image (with some limitations).

We’ll go over some ethical issues, impact on artists, and the future of image generation.

Ethical quandaries

First, the elephant in the room. Yes, these AIs can be used to generate porn. Almost all online services will have NSFW (not safe for work) filters for things like this. However, Stable Diffusion, another image generator, is open source, and can be used for these purposes. The concern ranges from generating nude images of celebrities against their will, to child porn.

On a somewhat unrelated note, Stable Diffusion can be run locally, on a custom server (I run it on Google Cloud Compute Engine — you can ask me how), or on Google Colab (search online for various notebooks for running it). Setting up a custom server is especially hard, but anyway. It can generate things such as this cute Angora rabbit plush:

cute angora rabbit, fluffy, rounddd, plush, sleepy
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
Steps: 20, Sampler: DPM++ 2S a Karras, CFG scale: 7, Seed: 3612287222, Size: 768x512, Model hash: 6ce0161689, Model: v1–5-pruned-emaonly

Angora rabbits are very fluffy and are usually bred for their wool.

Moving on, one can see that, second, there is a huge potential for spreading misinformation. Deepfake, where you substitute a person with another person with AI, is already concerning. But these image generation technologies just make things up out of nothing. Without ways to counteract, it can get ugly really fast. I once saw an article about ads for veteran benefits that use Stable Diffusion images of veterans (i.e. the images are wonky, but you may not notice if you’re in a hurry). Imagine that some people will think these people are real.

Finally, there is a huge issue about Art that I want to talk about. This is a lot of material, so gets its own section:

Impact for artists and creatives

Needless to say, artists are shocked when these tools start to display professional-level capability, and are not just toys to play with. Some embrace them, but others respond very negatively. Let’s first go over the landscape. (Note: I’m not an expert so I apologize in advance if there is any inaccuracy in the history.)

Landscape

These are what I understand to be some of the main players in this area.

DALL-E is probably one of the first image generators to make headlines. Released in Jan 2021. I remember being shocked when it’s able to draw an “avocado chair”, a completely new concept. Today (like right now), the most advanced version of DALL-E is probably served on Bing image creator which was just released.
Midjourney is probably one of the highest-quality generators at the moment, especially with V5 which was also just released. It produces really stunning “artwork”.
Stable Diffusion, released in Aug 2022, is open source. Hence it is growing very fast and there are new techniques all the time from the community (e.g. ControlNet). Excels at unparalleled customizability, but the learning curve to set up and use can be quite high.
NovelAI is Stable Diffusion fine-tuned for anime images. It is mentioned here because 1) I started with it, so yeah :), 2) It is conjectured* to be the base of all high-quality anime models in Stable Diffusion right now, 3) It probably generates a lot of criticisms from Japanese anime art community, which is the subject of our discussion.

*Note: Some things I need to say. The NovelAI model received a lot of attention when it was released due to unprecedented quality. However, it was leaked shortly after. We can’t really trace something like this, but it is widely conjectured that the model was merged and all current high-quality anime models are derivatives of it, an exception being Waifu Diffusion. Considering the NovelAI team contributes a lot to the Stable Diffusion community e.g. they invented many new techniques, I would implore you to avoid using these models. Some people don’t know this; some people do but don’t care (because Stable Diffusion is supposed to be open source, which doesn’t make any sense). That’s all I have to say.

You can Google more history, but let’s move on.

Is AI art “art”?

Apparently these AI-generated images are called “AI art”. Some call it “synthography” (Google it), being things like “cameras for ideas”. Wow, sounds cool. Speaking of cameras, is photography art? Yes and no, apparently. Just randomly taking pictures is probably not art. Taking pictures with intent and skills like a professional photographer probably is. Same answer here. AI is just a tool; it’s how to use it, not what you use. That’s what I think. But seemingly not everyone agrees. Granted, some are using AI art in disrespectful ways, such as flooding art sites with mass-produced images so that AI-generated images need to be banned on many of them. Some uses are more unclear, though. There is this news of a person paying homage to a recently deceased artist (South Korean illustrator Kim Jung Gi) by training an AI to produce art in his style. Apparently the intent is good, but the backlash from the art community is very strong, because it is seen as reducing a living person to a mass-produced product.

On the other hand, someone also won an art contest using Midjourney. You might have heard the news (Jason M. Allen’s Théâtre D’opéra Spatial). The judges said they would have awarded Allen the prize even if they knew the work was AI generated. So there you go. There is also an entire subreddit r/DefendingAIArt specifically to um, defend AI art. I will also add something to the argument, which is this image generated by myself using NovelAI:

The title is “Coexistence of technology and art”. The image speaks for itself, doesn’t it?

Let’s see how technology and art can coexist.

Coexistence

In any case, this is a radical shift in how we perceive and practice art, and like many other social changes, laws and regulations and many other practices need to adapt so that no one is left behind, especially artists. First, the benefits of the tech: It “democratizes” art, making it available to people who haven’t had the time to practice the skills, because it takes a lot of skill to get to a place where the art is really good. This lowers the bar, basically. One person who benefits from this is basically me, who has always wanted to get creative but life didn’t go that way. Thank you, AI image generation technology. Also, it has been argued that, like many other techs, small companies and such will be able to leverage more of the tech’s capabilities to compete with large monopolizing companies. Right now things like movie blockbusters are very difficult to create and require a lot of funding. This is another way in which the tech can democratize.

But the impact on existing artists is not to be overlooked, either. Take Greg Rutkowski, a Polish digital artist whose name is frequently used in AI art prompts (the texts used to guide the AI to generate images), to the point that his actual works are not easily searchable on search engines anymore. That’s concerning. It has also been argued that AI may replace entry-level artists in the future. However, since many artists start out their careers doing these small-time jobs, it would be harder for starting artists to develop their careers in the future. Then there is the issue of copyrights. It is now easy enough to use a few images from an artist to train the AI to “copy” the artist’s style, using techniques such as LoRA (note I’m not an expert in this area yet). This “style mimicry” issue is such a huge issue that researchers from the University of Chicago have developed Glaze, a tool to mask an artist’s style from an AI, to protect artists from these kinds of exploits. You can look at their research paper, which shows some ways in which AI can mimic an art style.

Clearly, some more laws and regulations, and accompanying tech, need to catch up with these developments. We are still in the wild west right now. Luckily, there has been some progress. “Zarya of the Dawn” is an AI-assisted comic using Midjourney by Kris Kashtanova, which has been granted partial copyright by the U.S. Copyright Office. Basically, an “image generated through a prompt” cannot be copyrighted no matter how complex the prompt is, due to too much distance between the input and the resulting image. However, the manual modification of the generated image, and the arrangement of images into a story, can be copyrighted. Note I’m not a legal expert and apologize for any inaccuracy here. As a person using these tools for a while, this is largely in line with what I think. But in any case, any effort to make the boundary clearer will make us all happier, both AI users and non-AI artists. Right now, everything is too chaotic to do official productive work with AI. I want us all to move forward with using these innovative tools for the overall resulting creativity of the human race.

Future

To look at the future, one must also look at the past. It may be interesting to note that this is not the first time technology has disrupted art, although it may appear that way, initially to me at least. The first example was probably, surprisingly, the camera. For this I will refer to the webcomic “Paintings & Photographs” by Reddit user u/objectiveplusone using NovelAI: https://globalcomix.com/c/paintings-photographs. It explains this conundrum very well. Please read it.

If you read articles about history of how photography impacted art (note: again I’m not an expert), you will see many parallels with the current era of AI-generated images. Due to the ease with which realistic images can be created, artists’ careers were impacted. Some artists embraced the technology; others hated it for being the commercialization of what is supposed to be creative human endeavor. Strikingly similar, in fact. So, what’s next? Turned out photography inspired new art movements such as impressionism. Essentially, artists realized they could not compete with photography on realism, and began to focus more on expressing emotions and other parts of the human experience. This is, in essence, what we currently perceive art as. It was shocking to realize this came about through photography, a kind of technology. Wow. It should be clear by now that technology is not an enemy of art, although many may have forgotten this fact in the current craze of AI disruption.

Like the camera, the future seems to be hybrid, that is, artists will blend both AI and non-AI tools for their needs. The industry will change. I will refer to this lunarmimi blog post: https://lunarmimi.net/artist-growth/a-digital-artist-survival-guide-for-the-ai-revolution/. By the way, this is a Thai artist (I’m Thai). Cool. In it, it is discussed that artists will need to 1) be open-minded about the new tools, 2) build a strong personal brand or identity to differentiate, and 3) develop storytelling through art as it is not yet something the AI can do out of the box. I wish everyone luck on this exciting journey.

I realized (after re-reading) that this is perhaps quite a bit of material for the uninitiated. [And as you can see, the organization is sort of chaotic.] Also there is no picture for, like, the second half, which is kind of sad, but I don’t have a suitable pic here. Also Midjourney and DALL-E aren’t represented in images. But I don’t have experience with these tools, so I may not do them justice. I considered doing comparison of the AIs, but again, they have different strengths (e.g. Stable Diffusion allows more variety, but you may have to do more than just a single generation to also get good quality). So forwent that.

I have been asked by my employer whether this tech will impact the financial industry. I do not really know, but I imagine that the current developments of text-to-video AI may lead to a compromise of the fraud detection mechanism, which is kind of scary. In any case, I’ve written all I know about the impact and social aspects of the tech. I hope they’ll be satisfied ;).

As stated above, I have started to use these tools to create Art. Still learning a whole bunch. I currently post my works on Reddit as user u/natso26. You can come take a look if you want. Here’s a recent work that I like:

That’s all from me for now.