GenAI Companies Need to Start Engineering Creativity Out and Predictability In…
“Creative” AI is the Enemy of “AI-in-Production”
You may notice from my posts that I’m fascinated by the use of AI in regular workflows, such as those discussed here by Martin Nebelong. Tools like Midjourney are amazing and fun and creative, but they’re not commercially very useful. Why not? Simply put, you can’t own what they produce.
You’ll read a lot of articles where people talk about this, but what I feel is missing are the product design decisions from the AI companies themselves designed to help address this gapping hole in front of us. It is the not-so-hidden obstacle to actually using Generative AI professionally, and it should be a key part of every company’s product strategy.
If it is, I can’t see it in their feature sets, as far as I know.
What makes me worried is that none of the major players seem to be doing so — not Midjourney, not Stable Diffusion. Not even Adobe, who is the one company whose entire product line depends on the commercial use of what’s created in their suite.
It’s almost as if we’ve all decided that the product roadmap can just ignore the huge AI-swallowing pit between us and the finish line. That generative fill tool? No one knows yet what it does to the IP protections of your creation— your ability to sell it, warrant it, or promise to a client that they own it.
If these companies DID see it as important — or see it as I do as the single biggest threat to their future— they’d be making product decisions to help support a stronger artistic argument… and they don’t seem to be.
These tools need to be less creative and a hell of a lot more specific.
Why Is Photoshop a Tool, But Midjourney Isn’t?
AI that “suggests” randomness — instead of enhancing or speeding up what you originally intended — weaken the claim of human authorship.
If there is ever going to be a future chance for AI content to be eligible for copyright protection, almost all the arguments I can think of fall to a question of control and intent.
The copyright office has made it clear that what separates a tool like Photoshop from AI generation is the ability for a human to mentally picture what it will look like before it’s created, and have the tool produce that thing. That exact thing. The presence of randomness, or unexpected contributions — or even having four interpretations of your intent to choose from — are what keep Generative AI from allowing human authorship.
In one of the very first attempts to copyright AI output, a comic book made with Midjourney called Zarya of the Dawn, the copyright office provided a fairly nuanced breakdown of how generative AI fails in human authorship.
Their official reasoning is that because the technology starts with random noise as the basis for an image, and then searches for patterns in it, there is no way a human can predict what the final result will be. Or as they say in the letter:
“[While prompts] …can influence the subsequent images, the process is not controlled by the user because it is not possible to predict what Midjourney will create ahead of time.”
The reason that image generators like Midjourney present the user with multiple options from the same prompt is that it’s guessing at your original intent — presenting possible options in hopes the details it filled it appeal to you.
But in the longer-term argument of Generative AI as an extension of human creativity, these options are bad ones. With a tool like Photoshop, clicking a pixel and telling it to fill that pixel with HEXA code #8985F0 will make the pixel a very specific shade of blue. If you click it 100 times, it’ll be that shade 100 times over.
Likewise with Adobe Illustrator, if I ask the computer to draw a vector line between two points, I can predict with 100% certainty that it will draw a very specific line, very specific weight, very specific color. It’s possible for a skilled artist to imagine what they want to create, and then have a very good idea how to adjust those tools to get that outcome.
Current Generative AI is Random by Its Nature:
For Generative AI there’s no similar amount of control or predictability. Instead, the copyright office compares Generative AI to a person taking their description to an artist, describing what they are imagining to the artist, and letting the artist paint it for them as a work-for-hire. That would be fine, but that only works because it’s still a human — the artist — who visualizes what you described, and then creates it using tools they can predict. In that case, the original copyright of that image is held by the human artist, who can then assign it to you as a work-for-hire.
But only a human can create copyrightable materials — so if the computer serves that role of interpreting your description, and then creates an image based on it, there’s no copyright created, and the computer couldn’t assign it to you even if it did exist. There are too many layers of unpredictability between your human intent and the final outcome to qualify as a tool for human authorship.
So in other words, every time the AI adds something substantial in your generation that you did not expect to be there, it weakens the future copyright claim of whatever human authorship ultimately ends up being needed for an image to qualify.
A “creative AI”, in other words, that adds elements or co-imagines something for you is a very poor commercial tool. Even though it’s half the fun of “playing with AI art.”
Most Major GenAI Tools Are Essentially Just Big Technology Showcases…?
As far as I can tell, yes — the current biggest tools are limited to very pretty, very cool technology showcases. You can create pretty things, but essentially can’t apply them to any problems where the value is the end product itself, like a piece of art or graphic design, unless you simply don’t care about ownership of output. That means any commercial application that may need to you to claim use rights, or to promise someone else use rights — like, say, a client, an artists who wants to sell and control their art, or a content distribution system, like Steam (video games), Spotify, Amazon, etc.
While at the moment the courts have not found any AI generated content to be copyrightable, they’ve only looked at cases so far with very, very lose connections between the human intent and the output. Cases in the future will start drawing a line between Generative AI that is expressing controlled human intent, and Generative AI that assumes the role of a non-human artist.
In order for AI to have a useful role in creative production — movies, music, art, etc. — it’s going to come down to how much control a human author has during creation. It seem inevitable to me that Generative AI companies will need to focus on this as part of their primary roadmap if they want to be competitive long term, arguably more so in the long run then things like having the best quality output or speed.
How Can Product Design Help AI Artists Argue Human Authorship?:
So what can a product do to help with this argument? Well, for one, integrating AI in a way that makes it arguably more like a tool (i.e. Photoshop), and less like a co-artist. There are two ways that I can speculate this could be done, off the top of my head — though really each of these companies should have a team of lawyers defining their legal theories, and then build to those.
This is just my off-the-top-of-my-head way of approaching it.
Author Intent Through Their Own Detailed Artwork as Input:
The first is to use the human artist’s drawn input as a specific set of instructions, minimizing the number of substantive deviations the AI can make from the original source. At the moment, this is the approach that I’m taking for my webcomic, Thoughts by Aaron — where I try to train the AI on my own art, and then use drawn references and editing to make it match my storyboards.
There are a few companies that I think are heading in the right direction. I’m particularly excited to try is a new tool I stumbled on called Wand, which is an iPad app for training models on your own art, and then selectively applying it by drawing and “morphing” your drawings with those models to match your input, but faster.
What really got me excited about Wand was a video produced by artist Tiye Pulley showing how he’s incorporating Wand into his workflow. I immediately stalked everyone on their team after watching it, because it’s one of the first concrete examples of what I mean in terms of AI with demonstratable control. I haven’t had a chance to try incorporating it into my own flow, yet, but I will be soon.
The thought is that by using your work to define your vision up front, and then very carefully and selectively applying that trained model on your mostly done sketches, there’s a closer relationship between your vision and the output. The goal is to move it closer to telling Illustrator to draw a line between two points.
The question then becomes, “How much is enough to be a tool from the court’s perspective?”
That’s a very big unknown, but if I were in charge of product roadmap for some of these companies, a large part of our regular discussions would be on how we could support these arguments, and be the first AI tool with the possibility of overcoming a challenge to human authorship.
Not that anyone loves the idea of a roadmap drawn up by lawyers, but their requirements would be front and center. I very much wish there were already a company doing this so that I wouldn’t have to be guessing at it myself.
Intent Through Speed and Real-Time Feedback:
The second approach is speed and small iteration.
At some point, the clear line between tool and non-tool become fuzzy. For example, selecting a space in a graphic and asking generative fill to create a person… that’s clearly in the copyright office’s definition of non-human authorship.
But what if I carefully select the eye of a photograph and ask AI to adjust the color balance of the eye from brown to blue? What if I ask AI to feather two layers so they are seamless, instead of manually doing it with masks and the paintbrush?
In those cases, I feel that there’s a pretty strong argument that it’s a tool, that while the artist isn’t supplying 100% of the details for how to change eye color, the resulting output has to be pretty well aligned with their intent — there’s only so many ways to have a blue eye.
This second hypothesis is based on the belief that there is a line where “unpredictable” becomes, “somewhat predictable,” and then, “fully predictable”… or predictable enough.
After all, I’d argue that Photoshop’s 1-to-1 color placement by hexa code has more control of intent than a traditional paint brush, where some degree of randomness exists in the exact lay of the bristles and the mixing of the paint. It doesn’t mean that digital art is more worthy of copyright than traditional painting, obviously.
So the goal is not to be perfectly predictable, but to cross a line from unpredictable to predictable enough to reflect intent.
Here is another video from Martin, this one showing an AI interface that runs as you draw and updates in a small region under your pencil in real time. It will give you a sense of what I mean.
These tools continuously make small changes in response to the artist’s equally small changes, so there’s never much of a gap between where the art is and what will change when the artists puts down a mark on the page. If I make a change to a pixel, and it only changes the 10 surrounding pixels in a way I can meaningfully predict through color selection and micro-prompting, I feel this has to increase the claim of intent.
And at some scale, the “undo” button — either via digital undo or whiting out and redoing a section with traditional paints— is how all artists deal with any unpredicted output from their brushes, like mistakes.
When I select a blue color and draw a circle on a character’s head next to the nose, I do expect to get a blue eye. And the line above that was indeed an eyebrow, etc. Somewhere on that scale is a point — maybe down to the pixel level, I’m not sure — that a reasonable person has to conclude the artist was able to correctly anticipate the outcome of drawing with their pen, and that the zoomed out final product is the result of thousands of intentional human decisions over the course of creation.
In a way, I realize as I’m writing this that it’s a bit similar to the old pinball game hearings in New York, where the game industry had to prove that pinball machines were games of skill. They did this by proving an experienced player could play reliably longer than a novice one.
At what point does the level of control become enough for the copyright office to consider it a tool?
I have no idea. Until the courts actually define some caselaw, it’s all just speculation, and I’m not qualified to do anything — even speculation on my part is pretty out there.
But for my final wondering, I have to say I wonder about Adobe, and all the people using the existing generative fill options to make small tweaks to their graphics design in their businesses and commercial works. Are they poisoning their own work?
Adobe has chosen to focus on generative fill and context aware fill as a substantial part of their AI offerings. If I draw a painting, select 30% of it, and ask it to generate me a bird… the courts have already said that 30% is not human authorship, and embedding it inside of a larger work doesn’t necessarily change that. Does that mean I own everything but that 30% in the middle of my image? Does it mean that I don’t own any of it? Does the 30% inherit the copyright of the image that surrounds it, or the other way around?
These questions are unanswered because, historically, it would have required a human to create a painting, then let a monkey come and fix part of it, and have the result challenged in court as a whole. This is a combination that’s been uncommon, to put it mildly.
Right now anyone using Photoshop’s tools are taking a gamble about what the courts will ultimately decide, not just on the training side, but on the generation side. Every edit we make could be contamination that we later regret.
What I do believe, though, is that products that are focused on control through either interpretation of the artists own drawings or through granularity and speed of generation are on the right path. Almost all the big players will have to come around to a strategy addressing this at some point, or it’s just head in the sand… — they create cool, interesting, and ultimately limited content.