Pretty Gen AI, Please!

Published in

The Edge

7 min readApr 24, 2023

The buzz around Generative AI (further in this text: Gen AI) is relentless and equally polarized. For someone who once entered the early boisterous scene of Gen AI, the present day can be challenging. Overwhelmed by the constant stream of information — new model releases, applications with more advanced effects, the lawsuits against AI companies, the warnings about Gen AI coming from reputable (or not) sources — I found myself confused to an extent I stopped using Gen AI for a while. With a background in human-computer interaction, a bit of experience in image processing/computer vision, a few months of eagerly trying out text-to-image gimmicks, and self-imposed mindfulness about emerging tech, I desperately needed to pause and think about how my relationship with Generative AI will move forward. I realized — I don’t want to break up with it, but if I had a chance to ask for 3 things, I would. So, pretty Gen Al, please!

1. Authorship Attribution

Probably the most mesmerizing thing about generative tools UI is the ‘emergence’ effect. You sit in front of the screen glaring at the brand-new image being generated. When the ‘WOW’ effect comes down, you realize that it’s actually not brand-new, but puzzled and diffused by an algorithm from numerous other images. Then the consciousness kicks in: can you actually claim this image as yours? Aspiring to be a fair content creator, one thing I’ve been practicing — is adding a meta-description ‘co-created with MidJourney / Stable Diffusion / … the list goes on’.

Giving credit to AI, however, is ironically not enough. Despite all the transformations the generated image undergoes, it is still based on someone else’s work. And this ‘resemblance’ might be sometimes rather enforced by us. When we want to reach a certain figure or aesthetic — we might provide a reference image as a part of the prompt, hoping that the algorithm will ‘reassemble’ the result to be transformative enough. And AI graciously lets us. And we are good. Until the moment the actual owner of the original image copyright finds out. And then, welcome back to Prince-Cariou's story Landmark Copyright Infringement Case Between Photographer P. Cariou and Appropriation Artist R. Prince.

That said, unless I purposefully want to mimic other work, I want to be sure of the image's authenticity and that by using it I am not breaking any copyright law. Moreover, I wish to be a part of a fair ecosystem, where creators of the original content are benefiting from the AI machine churn as well. It’s great to read that the industry majors are taking steps in this direction (Nvidia & Getty Images are about to establish a model of compensation to the creators whose work is feeding Generative AI). However, this level of information is not always accessible/obvious to the end-users of Gen AI. So why not make it otherwise?

Understanding high-level constructs of large language models is enough to argue that it should be possible to trace back the source (if not the creator, then at least the repository) of every image feeding an algorithm, and with enough rationale — should be even possible to establish the level of the transformative power of that very algorithm. As the end-user, I could imagine this being implemented directly in the UI of Gen AI applications as a meta-description. No need to be super didactic about it (even though it’s a great example of how Stable Attribution demonstrates ‘before’ and ‘after’ image generation), but there are definitely many good examples of sensible attribution out there (for instance, Observable’s or Github’s forking mechanism).

The Tower of Babel of the modern days: humans building AI-model from countless and author-less Gen AI content. Generated with Midjourney, using Peter Brueggel’s Tower of Babel and Beeple’s The First Emoji as reference images.

I would suggest Gen AI makers aspiring for a brighter future to look up to three industries: academia, music, and farming. Academia and music fields probably have the most rigorous practices of authorship attribution. Farming, as surprising as it may seem at first, actually has a direct analogy with Gen AI with content being equivalent to the natural produce. Throughout the millennia, conscious farmers have been practicing responsible use of resources, for instance, giving a year of rest to exhausted landfills. Some of those practices, not literally, but allegorically, are there to inspire us — the new generation of content farmers.

2. AI Watermark

Other than triumphs in the realm of creativity, AI has also learned to simulate reality pretty well. The deep fake industry is flourishing, feeding us with less (Imagine the Pope wearing a puffer jacket) or more (Imagine Trump brutally arrested) provoking content. I am not really sure if it’s more exciting or alarming that AI has finally learned to draw human fingers. One can argue that we are on a fast track to AGI (Artificial General Intelligence). In my opinion, AGI can not be reached until AI learns to understand the intentions of a human counterpart and respond based on its own judgments, human-like. When that level of AGI is reached — I expect it to interject in the process of deep fakes making in some shape and form, where this content is created just for fun or has malicious intent.

This scenario doesn’t have to take a dystopian route, where Humanity and AI are fighting for content generation. It would be enough to have some sort of non-fungible AI ‘watermark’ on every item of generated content, which would be responsive/discoverable in cases when the image is classified as representing reality. The actual methods could range from visual masking to metadata embedding, to on-chain record storing.

a girl’s portrait with a distorted, pixelated layer and a text stripe going through her face, saying “AI” — One of the interpretations of AI Watermark applied on images displaying humans. Generated with Midjourney.

This feature eventually can help not only to prevent the spread of misinformation but also facilitate a more advanced taxonomy of generative content. Specifically — help distinguish authentic content (created by humans capturing reality) from synthetic (generated by AI from a mix of authentic and other synthetic content). Eventually, the expansion of Gen AI will lead to a dilution of authentic content, if uncontrolled. As an AI researcher Mike Cook pointed out describing the Gen AI thunder that happened last year, “The internet is now forever contaminated with images made by AI. The images that we made in 2022 will be a part of any model that is made from now on.”

3. Control

Over its short history, Gen AI has broken records in user engagement. These are just a couple of examples:

Those stats are not surprising. It takes me on average 7 prompts to reach the result with MidJourney. Eventually, human-AI interaction condenses to a dull feed of requests and responses. Whether the tone of human-AI interaction should be more conversational (human-like) or the other way around, AI should have clear artificial threads — is a dilemma for another day. I am more concerned with the controllability of Gen AI for the sake of reducing the massive impact creators have on the environment with our skyrocketed computing power demands.

a room with walls and floor all covered by images and a computer screen displaying “out of memory” — What are we going to do with all that content? Generated with MidJourney.

While industry jumbos pushing different product strategies, from centralized super-efficient cloud rendering (NVIDIA), to decentralized “machine learning at home farms” (Hugging Face & AWS), the aspect of Gen AI content creators’ responsibility being mutual with service providers doesn’t often come across as obvious. What if instead of frictionless ‘one-click-away’ experience proliferated through the social media and e-commerce platforms built to generate and share as much content as possible, we tried to practice responsible content generation? For example, instead of running multiple prompt variations trying to achieve the imagined visual, we go and study prompt engineering in a lower-tech (and less computation-demanding) environment, for instance with Promptomania.

Or what if we were less perfectionist about the quality of intermediate results? I really like how 3D/VFX software makers have mitigated the shortcomings of personal computers — by prompting users to select and render the small area first, before plunging into the heavy rendering of the entire scene. Although off-the-shelf text-to-image generators do have some of the ‘preview-first’ capabilities (e.g., MidJourney first generates a set of 4 low-res images to pick one or more to upscale), there must be more ways to offload the image generation process. Giving users more control, for instance: to cancel the process if they see that the emerging image is far from desired or render only selected image channels — could do the work. As an example, rendering the alpha channel only (the black and white color layer of the image representing its transparency) is a way less heavy process, and could be enough in many cases to ‘preview’ the figure/ground composition of the image.

Of course, the financial incentive is important too. Trying to be very careful here with the phrasing to not catch the wave of hate from tech democrats — I am supportive of equal access to Gen AI as a class of technology, but I don’t believe that content generation can be endlessly free. Remember, we are in a relationship with AI. So it is our responsibility to make this relationship balanced by investing resources in it too. Whether it is money, original content, or responsible use. Otherwise, as Mike Cook foresaw, the reality will become more of a scarcity in the Web contaminated by Gen AI.

Pretty Gen AI, Please!

Written by Sofie Mart