Towards a Sustainable Generative AI Revolution
Facing the growing pains: how to steer the wild new age of the super subconscious
Humanity’s creative muscles are being stretched by the unstoppable generative AI revolution. By using text and other kinds of prompts, people use this technology to generate stunning images, videos, 3D shapes, VR environments and more. And yet, growing pains are starting to appear in relation to various matters, from the rights of living artists to the presence of AI generations within art competitions, art platforms, stock libraries and the like.
I am the cofounder of one of the first generative AI platforms that were launched at the start of this revolution (Geniverse). I have also been a multidisciplinary artist for a long time.
As somebody that is very active in both fields (generative AI and the arts), I intend to reflect upon many of the angles and perspectives involved in these matters.
First, though, we will take a fun journey together in order to review the very essence of this exciting technology from first principles, connecting it all with human creativity and the minds of creatives and artists.
And then, we will explore the good, the tricky, and the elephant in the room of the current state of this revolution. Finally, I will reflect on how we may all contribute to move towards a more sustainable scenario beyond these fast-paced initial stages.
Buckle up, as in this article we are going to go from metaphors about AI to latent spaces, the mind of an artist, smart generative environments and other future scenarios, the rights of creatives, the content authenticity initiative (CAI) standard and way more. Let’s begin.
Coming home
Let’s use a simple metaphor to explore what the generative AI revolution is bringing to the table and what it all implies in relation to creatives, artists, and all of humanity.
Once upon a time, you fell on the ocean of life. This is quite a vast ocean, an ocean of information.
Let’s imagine that you are made of two perspectives or parts: your subconscious and your conscious one. And let’s represent your subconscious as if it was a kitchen pot, floating on that ocean of information.
Your first priority on this ocean is to survive, and hopefully, thrive. For that, you need information. So you want to bring enough quality ingredients into your pot, and combine and recombine those ingredients in order to generate knowledge and ideas that help you reach your goals.
Above your subconscious pot, there is this diffuse mysterious shining sphere that represents our consciousness (of which we still know so little).
And so, there you are. Floating on the ocean of life, with your mysterious consciousness sometimes providing a direction for the cooking process that takes place within your subconscious pot.
All the while, that subconscious is constantly combining, mixing and remixing all sorts of ingredients (information) that reach it through our senses.
And sometimes, those combinations may become the seeds of new ideas. Metaphorically speaking, we may imagine fragile, subtle bubbles that emerge from that cooking process, ascending from the subconscious to the conscious. And, if we have space in our minds, if they are not full of noise, we may then perceive those fragile bubbles, and: Eureka! An idea!
But, there is an issue here. There is too much information on this ocean, too much complexity. And our subconscious pot has a limited size. It is not rigid. It is somehow flexible, malleable up to a point. But its size is still limited.
So nature evolved a mechanism to solve this issue of dealing with the tremendous complexity of the ocean of life: compression and decompression processes.
Our brain is able to take the information that arrives through our senses, and compress it into a form that has less detail and more abstraction.
Let’s begin to visualize this very important axis, the detail-abstraction axis. When we compress the complexity of life, we go from high detail (and a higher dimensional space) to high abstraction (within a lower dimensional space).
And so, within our subconscious pots, we gather these compressed representations of the complexity of the world, in what we sometimes call: latent spaces.
These latent spaces hold the abstract essence of different information domains. We get rid of uninformative details and we preserve a number of reduced dimensions, each of which documents relevant and useful factors related to whichever information domain the data belongs to.
Our brain can do the opposite process as well. It can perform decompression, and go from high abstraction to high detail.
“Visualize an elephant!” We hear those words and the image of an elephant pops in our minds. We just ran the opposite process, and decompressed that high abstraction representation (elephant) onto a highly detailed visualization in our minds.
The processes we just explored are very similar to the ones happening within AI networks. We train AI networks to learn to compress high dimensional domains (like the domain of natural images) into latent spaces that preserve the abstract essence of those domains within a much smaller number of dimensions.
And we also train them to decompress any point within those latent spaces into a corresponding high dimensional representation that belongs to the original information domain.
When we explore complex generative AI systems, from DALLE-2 (OpenAI) to Imagen (Google), Stable Diffusion (Stability.ai) and beyond, we find different intermediate stages, which, for example, may translate between modalities, perform diffusion processes, scale inputs and outputs, etc; but the initial base common to all those systems are these compression and decompression processes that allow us to move bidirectionally between high detail and high abstraction.
The specifics of AI systems depend on the objective we have. We may want to upscale images, or sharpen them, or to generate brand new images conditioned on text prompts, or some of those things together, or something entirely different. That will determine what sort of training objective and dataset we use, as well as the precise details of the different parts of the final architecture.
The key strategy used by the leading generative AI systems nowadays is based on what we call diffusion. The Stable diffusion system, for example, uses a U-Net like architecture that has been trained (with a large dataset) to predict the noise that has been added to an image.
Once trained, the same network is able to go from different combinations of image+noise (including from complete random noise) back to a high quality image in a number of steps.
It can also go from an image to another image, by adding some noise to the initial image and then performing the same process as before.
In order for these generations to move in the right direction, they are conditioned on the compressed representation of the text prompt we entered (which is injected onto different parts of the U-Net architecture).
Enough with the technical details. Let’s continue.
AI is coming home
And so, with the generative AI revolution, we are getting closer to our essence as beings capable of performing the complementary processes of convergence and divergence (compression and decompression), expressed through our analytical and creative muscles.
After a decade in which we gradually expanded and evolved the convergence capabilities of deep learning AI systems (able to predict, recommend, classify, identify, etc), the generative AI revolution completes the loop by adding superhuman divergence capabilities (able to create and generate). AI is coming home.
The magic of latent spaces
But what do we really mean when we talk about latent spaces or abstract compressed representations? We find the answer within ourselves, through a very simple example.
I take a walk in nature. When I return, my friend asks me how the walk went. I say: “Wonderful, I saw a beautiful cicada!”. And she asks me: “What did the cicada look like?”
At that point, I visualize the cicada in my mind. Let’s pretend that my visualization is expressed in a grid of 1000 x 1000 points of light. That is a 1 million dimensional space. If the points have color, then each of them will have a red, green and blue component (3 times more dimensions).
So I could start describing the cicada to my friend by saying: “Well, the first point of light at the top left of my visualization has 15 of red intensity, 25 of green intensity and 77 of blue intensity. The next point to the right of it has 145 of red intensity, 55 of green intensity… etc, the next one has.., etc”. And I could keep going like that through the 1 million points of light. The problems with this approach are obvious.
It may take me a month to describe the cicada and by that time my friend will be long gone. Zero efficiency. But the main issue is not even that one.
To know that one of those million points has 155 of red intensity is just not very useful. The fine detail often doesn’t provide relevant information. That’s why I will do something different.
I will compress all that complexity and richness of the details of the cicada into just a few dimensions, 30, 50, 100 factors (a small number anyway) that explain the essence of what I saw.
And I will tell my friend: look, it had a broad head, a stout green body and clear membraned wings. 4 wings, and the wings had these kinds of patterns. And it had large compound eyes, it had this number of eyes, and six legs, and the legs were like this, etc. I have compressed the high detail representation onto a small number of dimensions that communicate important and relevant information.
And now, my friend hears this and she does the opposite process, decompression.
She transforms these few compressed dimensions that express the essence of what I saw and inflates them to visualize in her mind the high detail representation that would correspond to that essence, the image of a cicada (which will differ from the one I visualized, because of the compression-decompression process as well as other differences between the systems involved and the previous knowledge each of us held in relation to the relevant scenario).
And so, in a way, every time we recall something, we are kind of rebuilding it, reimagining it, recreating it from that essence that we stored (the precision of that process depends a lot on the richness of the relevant latent space as well as the number of sensory modalities involved in its creation, in between other factors).
The following is an infographic I created months ago about how DALLE-2 works, comparing its processes with what goes on in the human brain.
Small Pot, Giant Pot
There are many differences between what goes on within our brains and within these AI networks, but one difference that is specially relevant to this article is the size of that subconscious pot, metaphorically speaking.
Our subconscious pot is fed by the experiences we have in life. When we talk to people, when we experience the world, we enrich its contents. Eventually, its cooking processes generate within our minds new ideas, visualizations, sounds, and more.
AI networks are fed (at training time) by giant datasets. The datasets used by generative AI systems are made of information collected from all around the internet. We are talking about massive amounts of data.
So, on one side, we have us humans, with our little subconscious pots.
On the other side, we have these giant AI pots, fed with data from all around the internet. Some of that data is in the public domain. But not all. And we will discuss what that means and implies, a bit later.
The depth elevator
It’s time to connect all the previous sections with art and human artists. Now, defining what makes an artist is an impossible task. Instead, I will focus on exploring something that has been common to many of the great creatives in history.
Remember that axis (detail to abstraction) that I was discussing above? In a book I published years ago, I wrote about another metaphor I came up with, which I call “The depth elevator”.
Imagine a vertical line with an elevator moving through it. At the bottom of the line, we have the high dimensional and high detail realm. This is where the complexity of the ocean of life is fully expressed.
At the top of the line, we have the realm of the compressed low dimensional latent spaces that preserve the abstract essence of the lower realms (here lives our language, for example).
Artists are masters at navigating this depth elevator in an agile, flexible and dynamic way. Let’s go deeper into this.
When we are little babies and later kids, we spend most of our time at the bottom of the depth elevator, interacting with the richness and detail of the universe. Our analytical mental modules are still not fully developed. It is our exploration phase.
Most adults, instead, tend to focus on efficiency by reusing the mental patterns already established within their minds (which also helps us to prevent wasting our precious fuel, the glucose that powers our brains). It is our exploitation phase. As such, adults spend a lot of their time in the narrow ivory towers at the top of the depth elevator.
Achieving a good balance between time spent at both halves of the depth elevator, is a healthy goal. A good balance between convergence and divergence, between compression and decompression, between abstraction and detail.
A lack of balance between those poles (in whichever direction) produces different kinds of issues in adults. I have written extensively about those matters, but that is not the topic of this article. Let’s get back to the artists.
Many great artists have the following thing in common. They are able to navigate this depth elevator in an agile and flexible way. They are able to go down into the depths at the bottom of the elevator, where the richness of the universe awaits.
And, crucial point, they don’t just dip their toes and leave. Instead, they are able to spend long periods of time down below, exploring those muddy, wild and uncertain waters.
They are also able to crystallize that richness onto different interpretations and representations, which may express themselves at different levels all throughout that axis that goes from detail to abstraction.
And the representations themselves, or instead, their explanations or the way they are communicated, are also located much closer to the top of the depth elevator.
This all is in contrast to the typical adult that spends most of the time at the top or close to the top of the elevator. And you guessed why.
Because being on the ivory tower of abstraction, at the top of the elevator, is way more comfortable (and requires less fuel) than navigating the muddy bottom of that axis, which contains the complex details of the universe (metaphorically speaking, we could also say that it is way more comfortable than dirtying our hands exploring the wild playground down below, at the bottom of the elevator).
Here we arrive at another crucial point. Navigating that depth elevator in the way that many of the greatest artists in history have been able to do, requires effort. It requires time. Perseverance. And, in a way, to go against the natural predisposition of our adult mind to be efficient and avoid wasting our precious fuel.
In regards to that, it is relevant to point out that a number of platforms are currently banning generative AI art (or putting it in a separate category or area), because they consider it to be: “low effort” art.
Yes, it takes some effort to find the right prompt to guide generative AI architectures. But the effort and time required by that process cannot be compared to the years and sometimes decades that it takes to master the process described earlier. We will go deeper into this point and other related ones a bit later in this article. At that time, we will also reflect on potential solutions to such conundrums.
So, by exercising this flexible navigation through the depth elevator, great artists and creatives are able to express the richness of the universe in novel ways.
Pick anything in life, say, wood. You may experience wood in a very detached, abstract way. Or you may explore all the intricacies of wood at a very deep and detailed level. If you are able to flexibly move between both poles, you are in a much better position to create something novel and different related to that element of the universe.
Great creatives are also able to understand different ways of interconnecting various areas of that vast ocean located at the bottom of the axis, across the various layers of those waters and also through the top layers of the depth elevator.
When, for example, a great creative experiences rhythm, she can go beyond disciplines, techniques, tools and flashy terms. A great creative sees and feels rhythm everywhere. In the lights and the shadows projected by a curtain, in the sound and movement of falling tears, in the dance of the stars, the gaps between our thoughts, and beyond.
Throughout years and decades, great creatives expand and consolidate the latent spaces of their subconscious pots.
They also refine the way they navigate their depth elevators, which allows them to connect detail with abstraction in powerful ways that enrich their creative processes.
In addition, artists and creatives often collaborate with others. By doing so, different subconscious pots may enrich each other.
So, if you study some of the greatest creatives and artists in history, you will see that they all had something to say, a message, a vision. And also that such vision, and the way they expressed it, was inextricably connected with their capacity, cultivated during decades, of navigating these depth elevators in fluid ways, exploring both, the depths of the richness of the universe, as well as the ivory towers of abstraction and many of the realms in between both poles.
Finally, regarding those depth elevators, the next step would be to visualize them not as isolated entities, but as multiple funnels that are interconnected with each other within multidimensional spaces.
The following image of an origami, seeks to represent a small fragment of that extension of the metaphor.
It is time, though, to stop the elevator and move on, in order to focus on a review of the current status of the generative AI revolution, as well as ways to address its current growing pains.
So using what we have explored above, let’s consider the situation today and in the coming future, and what could be done about it all.
The good, the tricky, and the elephant in the room
Let’s explore a number of consequences derived from this initial phase of the generative AI revolution.
The good
- Generative AI won’t replace human creativity. It will enhance it.
- This technology demystifies creativity. Think of what Edison said: Genius is 99% perspiration (combination, recombination, productive work and experimentation) and 1% inspiration (establishing the seeds, polishing, etc). Thanks to this new technology, we now realize that we can automate a large percentage of the creative process, a part that takes place subconsciously in our minds.
- Studies about human decision making show that we make more than 30000 decisions each day. But we are only aware of around 0.26% of them (e.g. research by Huawei). Way more of our lives than we may think, takes place subconsciously. By automating our subconscious cooking processes with AI technology, we can impact in positive ways a large part of our existence.
- In fact, I call this new era “The age of the super subconscious”.
- Think of this technology as a series of different iron man suits that will amplify your subconscious pot and empower your creative muscles.
- Different iron man suits will have different styles, traits, and personalities.
- Prompt engineers are people that will become experts at getting the best results out of these iron man suits. They will know the ins and outs, the strengths and weaknesses of each.
- They will also be masters at using their human experience and intuition when interacting with these powerful amplifiers in order to achieve the desired result.
- As such, these prompt specialists will be highly regarded in coming years. Their role will become a prestigious one in the job market. And we will witness a large amount of courses, publications and systems that will educate and help people train this skill.
- Today, our prompts are natural language and images. But thanks to multimodal architectures, prompts will soon be any kind of data we want to use to guide these architectures (different systems will be designed to absorb different kinds of guiding inputs).
- The initial text to image phase has now transitioned to text to video and text to 3D capabilities. Eventually, we will be able to output all kinds of data with custom systems that will target the needs of specific verticals.
- Next, we will witness multimodal output capabilities, which will eventually allow us to produce, for example, full movies that will include visuals, dialogues, music, and more.
- This technology will inspire new forms of art that we cannot yet imagine. Multimodal generative AI is poised to trigger the emergence of novel ways of combining explored and unexplored areas of the depth elevator, which in time may become highly regarded new forms of artistic expression.
- Generative AI will impact a very large number of sectors. It will be used to expand scientific datasets with synthetic generations, revolutionize brainstorming processes, personalize branding in ways unimaginable only months ago, accelerate the rise of real time dynamic “only for you” marketing and advertising, and usher presentations of all kinds into a new era by surrounding them with media that matches their content in impressive ways, in between many other examples. From stock libraries to design boutiques, whole swaths of the media landscape will rush and compete to incorporate this technology.
- Cutting edge tech like VR & AR (and in general all forms of XR) will incorporate this technology (experiments are already ongoing) and eventually we will witness real time generation of immersive spaces that regenerate in smart ways by tracking the gaze of the user (it’s interesting to consider the connections between these experiments and the theories of Donald Hoffman).
- This technology will also accelerate the exploration and experimentation phase of many creative processes. From concept design to product design, character design and prototyping stages across a wide range of fields, generative AI will allow us to do more in less time, to try all sorts of new directions and to go deeper into our explorations of every level of the depth elevator.
- The so-called “metaverse” is for many still a Utopia and a decent implementation of it appears to be pretty far in the future. If the metaverse is ever to become a useful reality, it will probably happen on the shoulders of generative AI technology, which may be the key to accelerate its implementation.
- Further in the future, we will witness the rise of smart generative environments (SGE), which will mutate according to our needs or emotional state. Houses, event venues and other environments will begin to resemble organic living entities by matching and resembling the intent and emotions of their contents. They will do so in multimodal ways. Eventually, we will be able to converse with those environments and they will become a key support of our mental balance and health.
- The combination of generative AI with ever more powerful sensing models capable of interpreting every subtle nuance of our expressions and behaviors will allow us to produce real time multimodal interpretations of our emotional and mental state. When combined with new iterations of brain waves reading tech (EEG, MEG, etc), this will usher a new kind of creative expression that will literally use our most intimate sphere as a brush to produce extraordinary renditions of the human condition.
- Although some jobs are and will be in danger, it is also highly likely that new roles that we cannot yet imagine, will emerge from the need to manage and interact with this technology.
- At the same time, many of the impacted jobs and roles will survive and even thrive by embracing this new age and adapting their processes to what this new technology offers.
- A good number of people who may not be professional artists, but that have a natural predisposition to exercise their creative muscles, will thrive with this new technology. They will strengthen those muscles in faster and easier ways, and they will enjoy new opportunities to augment and amplify their creative potential.
- And we end this section as we began. Reminding us all that Generative AI won’t replace human creativity. It will enhance it. And that, exercising our creative muscles will continue being the very same highly recommended activity. Achieving a good balance between our capacity to diverge and converge, compress and decompress, will continue being so very important for our mental and spiritual health for the foreseeable future.
The tricky
- We, humans, have a limited and relatively small subconscious pot. Generative AI systems are trained to hold massive pots that encompass a large part of the knowledge of the internet.
- Because of that, it doesn’t seem fair, nor morally correct, that human creatives should have to compete with generative AI systems.
- When machines overcame humans at playing chess (a way less consequential event than this one), no one thought that it would be great fun to keep exploring human vs machines chess competitions (beyond the ones that demonstrated that we had lost the battle). We accepted that they were better. And then we went our separate ways.
- Human chess players use AI to train themselves and become better (akin to the augmenting and amplifying capabilities of these metaphorical iron man suits provided by generative AI systems).
- AI systems that play chess or go, produce at times really beautiful moves that would never occur to a human. They kind of have their own special perspective (based, of course, on a tremendous capacity to look ahead in time). And yet, very few people are interested in following machine vs machine competitions. Humans prefer to see other imperfect humans play.
- The key thing, in any case, is that they separate both domains. Machines help human chess players train and become better. And they may also play among themselves. Humans, separately, play at their own competitions.
- I believe that eventually a similar thing may happen with generative AI (with a number of differences, of course, as these are very different domains).
- Another tricky point to consider is a key factor that lies behind some of the current excitement with this technology. And I will expand on this matter in the final section of this article. Let’s, for now, introduce it.
- Greg Rutkowski is, in the opinion of many people, one of the best, if not the best, illustrator of fantasy art nowadays. And his name appears in a massive amount of the prompts used to produce some of the most impressive generative AI art in recent times.
- So, after all the dopamine rushes triggered by the production of amazing art that seems to have been painted by Greg Rutkowski, subside, after those dopamine rushes go down, a lot of people are going to be left with hundreds or thousands of AI generated images or videos, and then, they will ask themselves: “And now, what?”
- “Nothing” will be the answer in most cases. Because most of those people were not really exercising their creative muscles in relation to any deep meaningful internal drive; they were using this technology like the person that buys a new iPhone, in a sort of compulsive way, following the shiny latest tech.
- And when that compulsion dies down, they will feel sort of empty. Because most of what will be left behind is not theirs, it belongs to, among others, Greg Rutkowski and his style, crafted over decades of hard work (as an example, among many other living artists whose work powers these networks).
- In any case, let’s be realistic. Things have been moving too fast and it makes sense that people need time to catch up. There may be many solutions to the current scenarios. And I will discuss some of those at the end of the following section.
The elephant in the room
- AI generative systems are only possible because of the giant datasets that are used to train them.
- AI generative architectures are trained with massive datasets composed of images, videos, text and soon other kinds of data.
- This data is typically extracted from the internet by the groups that create these datasets.
- Some of the data used in these datasets is public domain data. It seems fair to use such data for the creation of these datasets.
- But, a good part of the data used in these datasets belongs to living artists that have not declared it to be public domain data. These are artists that make their living by selling such data = selling their decades of hard work that have produced a specific style and a series of works.
- These artists are, indeed, the foundation on which this revolution is supporting itself on its meteoric rise.
- And so, an increasing chorus of living artists are complaining about this. Some of them state that the works of living artists should not be included in these datasets. According to some, their complaints have been falling on deaf ears. They are being mostly ignored (at least so far).
- If we ignore the complaints of these living artists, we are ignoring ourselves. For today, we are discussing visual art, but tomorrow, it may be music, novels, legal writings, or whatever our occupation or field may be.
- Let’s again contemplate it all from the perspective of what many people experience when using these systems. On whose shoulders has been built the dopamine rush that a person may feel when they produce a stunning digital art which resembles so incredibly well Mr.Rutkowski’s style and body of work? On Mr.Rutkowski’s ones, of course. More specifically, on the decades of extremely hard work and perseverance that Mr.Rutkowski applied and invested to create that style and body of work.
- A style and a body of work that is now giving that person such an intense dopamine rush when they take some time to come up with a prompt, which includes Mr.Rutkowski’s name, and then click a button and with minimal effort produce a result that so closely resembles his art.
- Some may say: “But it took me 50 hours to come up with the prompt”.
- Whether that is an inflated number or not, it does not change the fact that there is no comparison between anybody’s exploration of language prompts for a few minutes or hours, and the decades of work invested by the likes of Mr.Rutkowski.
- It also does not change the fact that Mr.Rutkowski never gave explicit permission for his artworks to be included in the datasets used by these AI architectures.
- Prompt engineering is art+science. And it will gradually become a prestigious skill and discipline.
- There will be tons of books and courses on the matter. Great prompt engineers will know the ins and outs, strengths and weaknesses of many different AI architectures, while at the same time being capable of applying their human intuition to the generation of prompts that extract the best results from the human-machine interaction. Indeed.
- But that is still not an excuse to trample on the rights of fellow human beings and living artists. And in the next and final part of this article, I will address in more detail what we could do about this and other related matters.
- And let’s emphasize again the following: this revolution has moved so fast that it is understandable that people need time to catch up with it all. And that process of catching up and finding a more sustainable scenario is in its initial stages.
- I will always support generative AI, but above all, I will support and defend my fellow creatives (because people and their lives should always matter more than technology). This is a matter of ethics and morals (legal aspects are not part of this article. Those will be addressed by others and I believe that ethics and morals should be the first compass in this matter).
This is a wonderful revolution that will bring many benefits to humanity. But as we can see, there are also tricky sides to consider at these initial stages. Let’s discuss how we may address some of them.
Steering the revolution
I will address this final section from a moral and ethical stance.
It is to be expected that, eventually, a number of bodies and groups will introduce different forms of regulation related to these systems and companies will also introduce their own safeguards and controls. But these, as well as other legal perspectives, will take time to be established.
Although maybe not as much time as we would expect. At the end of the next section, I will comment on the content authenticity initiative (CAI), an open standard, founded by Adobe. CAI has been joined already by hundreds of companies, some of which are already planning to implement it in their platforms.
This will allow them to track where digital content comes from, if generative AI has been used to produce it, as well as other factors related to misinformation and the protection of the rights of creators.
Let’s now reflect on ways to make this revolution more sustainable.
A living artist, that has spent decades developing a style and a body of work, whose rights belong solely to the artist we are considering, should have a say and/or be compensated if this artist’s work is to be included in any of these massive generative AI datasets.
Otherwise, it is as if, for example, you display an artwork within a gallery, and somebody comes and grabs it, takes it away, and profits from it. There is something universally known as copyright, which did not disappear magically at the start of the generative AI revolution.
Some will put the example of YouTube, saying that at its beginning stages, Youtube was kind of hand waving these issues, and that otherwise they would have never taken off the ground. As we all know, nowadays and for a long time, YouTube employs a very strict set of mechanisms for protecting copyright within their platform. The fact is that generative AI has already exploded in a massive way. So, an initial “what the heck” phase is understandable, but that phase is now behind us. Therefore, the moment to begin protecting the rights of creators, like YouTube and other similar platforms had to do, is right now.
Finally, we need to discuss a super important issue, the gray zones. To get there, let us quickly review a point that we raised in the previous sections.
It is not fair to have humans and machines competing in the same art competitions, art platforms and the like. Humans have small subconscious pots. AI systems have massive ones. Human’s subconscious pots hold the limited experiences of their life, of one life. AI systems hold the knowledge of millions or billions of human beings. Let’s get real. It is not fair and it is not moral to have them compete with each other.
Instead, just like it happened with chess, we can imagine separate sections in art competitions and in art platforms. Art made by humans. Art made by AI. And this is already happening on many platforms around the world. But this point takes us, finally, to the gray zones.
The gray zones.
“Wait, this thing was not fully produced by AI, you see. I used AI to produce part of the work, yes, true, but then I polished it, I built on top of it, and well, therefore, it is legit, right?”
We are going to hear a lot of this. So it is crucial to address this kind of scenario.
A recent ruling by the US Copyright office in relation to a request to register an AI generated artwork, states that “human authorship is a prerequisite to copyright protection in the United States and that the Work therefore cannot be registered.” An extended discussion on this ruling can be found here.
But again, what we are about to face (and it is already happening) are the gray zones. The in betweens. And I believe that the answer to those lies in the — public domain vs non public domain — discussion.
Because, in a way, everything has changed but nothing has changed at the same time. Here we go:
- Before generative AI exploded, you could go to google search, find some public domain images, videos or whatever kind of data, and incorporate those into your creative process, and all was fair and good.
- Before generative AI exploded, you could not go to google search, find some non-public domain images, videos or whatever kind of data, from some living artist, and take them and incorporate them into your work without asking for permission (obviously when trying to profit from the resulting combination of their work and yours. We are not discussing here the cases in which you just use some online artwork to experiment by yourself, privately, without seeking to make any profit from it).
Well, guess what. That’s the answer. Nothing new. The answer is that the same criteria could be applied onwards.
- As we navigate this generative AI revolution, it should be ok to use this technology when it is connected to datasets that use only public domain data (or data from living artists that have explicitly given their permission for their creations to be used within these datasets). We are again only referring to scenarios that seek to generate profit from the use of this technology.
- It should not be ok to use this technology, fully or partially, when making use of datasets that contain non-public domain data, if you intend to use the result for any commercial purpose. You may experiment with it for your own personal use, like some people may do nowadays when they download an artwork from a famous living artist, but certainly not for commercial purposes.
These are thoughts based on, I believe, common sense. But others may come up with novel ideas regarding ways of compensating the artists that may provide new avenues to solve this conundrum. And YouTube provides again a clue as to what some alternative ways to address these issues may look like (more about this below).
And so, art competitions, art platforms, stock image platforms and the like could ask participants to disclose:
- If they have used generative AI technology.
- If so, which one have they used and what datasets are powering that technology.
- If the datasets powering that technology contain only public domain data, then they may choose to open their doors to that work.
- If the datasets involved contain also non public domain data, then they could decide to close their doors to those works, or to put them in a separate section.
- People may lie, of course. So we will witness as well the rise of automated systems capable of recognizing if part of your work matches parts of the creations of living artists whose copyright is protected.
And this is exactly what platforms like YouTube are using today, for example, in relation to the music of the videos people upload. There will be lots of false positives and the like. Just as it happens with the systems that YouTube uses nowadays. It is the price to pay to protect the rights of living creatives and artists.
Extending these mechanisms to account for all sorts of data, and data that is way more complex and high dimensional than audio, won’t be easy. But there are surely already people working on these matters.
If we look at YouTube again, we also see the variety of ways in which platforms could deal with generative AI art that is built on top of non-public domain data (and it is to be expected that platforms will eventually be able to detect this, either because the user declares it, or because their automatic systems detect it, or because technology like the one that the CAI standard proposes, helps detect it).
Platforms may add advertising to those works, and share the profits with the impacted artists. Or they may block parts or the whole of those works in the regions affected by the copyright related to the artist or creative group. Or they may put them in separate special categories (away from creations produced by humans) while these scenarios get further clarified. We may also witness a great variety of ways of dealing with creations produced by humans+AI systems powered by public domain data. In summary, once detection systems become good enough, there will be a number of ways of dealing with these gray zones.
The work on those detection systems has already begun. The CAI standard, by using smart metadata and other tools, will soon begin to be implemented by companies and platforms all around the world. Let’s briefly explore what it does.
Responsible AI and the content authenticity initiative (CAI)
A number of companies and groups have already been researching and working on designing systems that can be used to deal with gray zones as well as misinformation.
One of these systems is the content authenticity initiative project (CAI) started by Adobe. CAI was actually started in 2019, as companies like Adobe anticipated the need of a standard to deal with the potential for AI tools to produce misinformation and other related issues.
In their words, CAI members are: “a community of media and tech companies, NGOs, academics, and others working to promote adoption of an open industry standard for content authenticity and provenance”. (list of current members)
The group, whose membership is free, provides open source tools that allow to track the provenance and attribution of digital content throughout the entire pipeline, from capture to distribution.
The ultimate goal is to ensure that creatives are recognized for their work and that people and platforms can understand what the origins and methods involved in the production of the content they are dealing with are.
The key thing to highlight is that the CAI standard is going to enable people to know if, and how, generative AI was used to create a certain piece of content.
It is a good sign that there are large companies working to promote what they call “Responsible AI”. And that systems are being put in place that will allow us to know where each piece of digital content comes from, if generative AI was involved in its production or not, what copyright is attached to the content, etc.
It’s important to highlight that to protect the privacy and security of photojournalists and other creators, such creators have the option to choose if to preserve attribution or remain anonymous when using these systems.
The world is watching. At the recent Visual 1st conference (the premier conference for the imaging ecosystem, which takes place in San Francisco and is led by Hans Hartman and Alexis Gerard), generative AI was a big part of the conversation. I had the pleasure of having a great discussion with Hans and Alexis during the fireside chat that opened the event.
Visual tech experts like Paul Melcher are doing a great job bringing the very latest of generative AI to audiences worldwide.
Educators around the world, from organizations like fast.ai to AI master programs, YouTubers with hundreds of thousands of followers, and experts in prompt engineering, are documenting and explaining every stage of this revolution.
In the realm of datasets, we also find very interesting companies and projects like datasetshop.com, powered by vAIsual, pioneers in the generation of legally clean synthetic stock media and creators of the world’s largest licensable biometrically-released real-life dataset.
Again, it is good news that we are witnessing the rise of terms like “Responsible AI” and “Legally Clean” datasets.
And as a human that is very active in both areas, generative AI and the arts, I tried to give you in this article a high-level overview of a number of perspectives involved in these dynamic early stages.
Let’s remind ourselves that these are indeed early times in a rapidly evolving context, so let’s all be as gentle as possible with each other, as we do our best to find the right balance between encouraging a technology that will bring many benefits to humankind, and the need to protect the rights of creatives and artists.
What the future holds
As for the coming times, in my view, and in simple terms:
- Artists will keep on being artists. As this article has strived to explain, being or not being an artist has nothing to do with specific tools or technologies. Instead, it has a lot to do with the ways we interact with those depth elevators we explored previously.
- Engineers will keep on being engineers
- Researchers will keep on being researchers
- Prompt engineers (a new segment), will be that, prompt engineers.
- And artists and creatives, professional or not (the following applies equally to pro creatives or to those that have a natural predisposition towards exercising their creative muscles) that incorporate generative AI tech and prompt engineering into their processes, will have a better chance to lead their fields, and may become even greater artists and creatives because they will be incubating their ideas with the help of these powerful iron man suits (immense subconscious pots) as well as using that very same tech to accelerate their creative production processes.
- Finally, lazy people will keep on being lazy people.
Let’s make it together
AI is definitely coming home. We must all push together to bring the best out of this revolution, in order to benefit humankind as much as possible.
And to complete this article, where we have explored pretty complicated matters, let’s end on a lighter tone, with some musical tributes to this wonderful technology.
The following is a small fragment of a performance by Soprano Covadonga González Bernardo, performing a song that was composed as a collaboration between different AI systems and myself. The GPT architecture was used for the lyrics, music transformers for melody+chords, and VQGAN for the visuals. (the visuals don’t appear in this small fragment). This was a project proposed and organized by the Instituto of Inteligencia Artificial @ iia.es, where I’ve given talks a few times.
Next, a simple little piano improv dedicated to the theme of generative AI coming home, getting closer to the human potential.
Finally, a bit of time travel fun. Can we all appreciate that what we are experiencing today with generative AI would probably have been interpreted as a miracle just a few decades ago? Let’s travel back in time to the year 1950 in Spain :)
Stay well everybody, and above all, stay human.
Epilogue
In regards to my last phrase, “stay human”.
Sometimes, people ask me: What do I think will happen when AI excels at system 2 capabilities (reasoning, planning, etc) in say 30, 40 or 50 years from now?
System 1 and 2 are different types of thinking modes in our minds.
System 1 refers to fast, subconscious, simultaneous, intuitive processes, and this is the domain where AI is reaching superhuman capabilities.
System 2 refers to the slow, logical, rational, systematic, precise and sequential kind of thinking. And mastering this second mode is still way beyond our AI systems. (see Daniel Kahneman’s book “Think fast and slow” to expand on system 1 vs system 2 thinking).
A discussion about system 2 capabilities in connection with AI, now and in the future, would fill a whole article of this size and larger. So I leave that for another time. Let’s get back to the question posed at the start of this epilogue.
I typically answer that the question may not make sense anymore in a few decades. Why not?
Because today there is a separation between AI and humans. AI is there. We are here.
But in some decades, that separation won’t be there anymore. Think of what the company Neuralink is working on already these days. That’s only the very beginning of what’s to come.
In some decades, our technology, including AI, and our biology, will have merged in many ways.
And then, the new question may be: “Where will we go next, now that we are together?”
Thank you for reading.