How Did AI Become Creative?

Vin Bhalerao
12 min readSep 9, 2022

For a long time, we have believed that creativity is a uniquely human trait with a mysterious or spiritual origin.

When creative people are asked how they come up with their original ideas, most are unable to provide a rigorous or convincing explanation. And since human beings have a tendency for mystifying and glorifying unexplained phenomena, it leads to the belief that creativity is an otherworldly gift that only humans, and that too, only a few humans, possess.

On the other hand, if you have been following the recent developments in the world of AI image generators, you may have started questioning this belief. I am talking about DALL-E 2 or Midjourney or Stable Diffusion. (Please see my own summary and analysis of these developments here.)

You have probably come across striking images generated by these AI systems. Everyone who come across these images for the fist time responds with a sense of awe and disbelief.

Here are some examples:

Some images created by DALL-E 2 (Source: Instagram)

After seeing these images, most people start to wonder: how are these purely mathematical systems capable of displaying such high levels of creativity? If creativity has mysterious origins and is so special to (certain) humans, then where is this artificial creativity coming from?

In this post, I am going to offer an explanation for these types of questions.

But in order to do that, we first need to understand how these AI image generators work at a high level. I will keep the explanation as non-technical as possible.

How AI Image Generators Work — In Simple Terms

If you search on the internet, you will find many articles that go into highly technical details of how AI image generators work. They are typically written for highly technical people.

But I want to keep it really simple so everyone can understand at least the basics enough to answer the above questions.

The AI image generators typically consist of two phases: the training phase and the generation phase.

A) The Training Phase

AI Image Generator — Training Phase — Extremely Simplified (Image: Vin Bhalerao)

In the training phase, the AI system processes hundreds of millions of images, along with their captions (i.e. text descriptions of the images). It breaks the images and the text descriptions into a huge number of “features” that describe their various aspects.

These features are basically what we ourselves would think of when asked to think of an object.

For example, when asked to think of a cup, we might imagine a cylindrical, slightly tapered, solid object that is open on one side and can contain liquids. It may have a handle, has a range of thicknesses, colors, textures and so on. Each of these is a feature of a cup.

The training phase of an AI image generator consists of learning the relationship among the features of various objects in the images and their text descriptions.

The output of the training phase is called a “model”. It is essentially a mathematical structure that represents the relationships between tons of different objects and their descriptions.

If you create such a model using a small number of images, it isn’t very smart. But if you do it over hundreds of millions of images, the model becomes capable of capturing most of the nuances of such relationships. They can essentially reach human level of understanding of objects, their features and their descriptions.

In the world of AI, this phenomenon is usually described by the phrase “Quantity has its own quality”. Simply processing an extremely huge quantity of data in this manner can impart the resulting model qualities that are typically associated with human-level intelligence, at least within some limited domains.

So that’s a high level description of the training phase in short.

B) The Image Generation Phase

AI Image Generator — Generation Phase — Extremely Simplified (Image: Vin Bhalerao)

In the image generation phase, a user can specify a text prompt and the image generator uses the model prepared above to generate, from scratch, one or more images that it believes are best described by the prompt.

It is difficult to describe this process in layman’s terms, but let me make an attempt:

  1. To start with, the text prompt given by the user is processed using the model prepared in the training phase. The output of this process is an internal representation of the objects, along with their desired features and relationships, based on the prompt.
  2. The image generation process starts with an initial image which consists of purely random noise. Then, in a series of steps, it “shapes” that random noise progressively into the final output image. You can think of this like seeing a scene slowly emerge from thick fog.
  3. The “shaping” of the noise into the desired image is controlled by the internal representation generated in step 1 above. This is how pure noise turns into the desired image.
  4. After a specified number of such iterations, one or more images are output.

With that high level understanding of the image generation process, let us return to the central question: All of this looks very mathematical / algorithmic. How does creativity enter the picture?

How Does Human Creativity Work?

To see where AI might be getting its creativity, it’s instructive to look at how it works in humans first.

Most of us typically learn to draw in early childhood. We learn how to hold a pencil or a marker in our hand and start making markings on a piece of paper or whiteboard (or walls and even our little brother’s faces — I’m sure you’ve seen the viral video 😊).

Our “creativity” at this stage mostly looks like random scribbling. But through that, we are slowly learning the relationship between our hand movements and the markings on the piece of paper.

Over time, our hand-eye coordination improves and we start to draw shapes that look intentional rather than random.

At this stage, our creativity might evolve to a level where we are choosing what to draw and how. We usually draw something from our real world experience — like a dog or a fish or a mountain etc.

Then, over time, we learn various techniques — painting, sketching, photography, digital tools and so on. Also, we learn about different styles, aesthetics, history, culture, and so on.

At this stage, our creativity may get to a level where we are copying some famous artist’s style, but changing it slightly here and there. Kind of like doing a “remix”.

We may also continue to demonstrate our creativity in terms of the subjects we choose and how we choose to draw them.

At some point, we may get inspired by something or simply experimentally land on something that deviates sufficiently from any existing artist’s style that we can call it our own style.

Then we may develop this style further, going deeper and deeper into the unique aspects of it.

So, at this point, our creativity may consist of our original style, possibly some original subjects or some original way of depicting them.

Long story short, it looks like what we call “creativity” consists of the following:

A) Experimentation: Constantly experimenting with various ideas. Experiments may be of three types:

  1. Remixing other peoples’ ideas, or
  2. Draw on our own experiences, emotions, and inspirations, or
  3. Completely random experimentation.

B) Curation: Along with learning artistic skills, we also keep developing our “taste” — what looks or feels good, what gets appreciated by the public, what gets the other artists take notice of your work, and so on.

This then informs our experimentation — we decide which experiments to perform based on this.

When the sum total of the above results in something remarkable, we call it “creativity”. (In fact, I have written a post where I describe this “formula for creativity” in more detail.)

Now let us try to see what parts of this AI is able to replicate.

How Did AI Become Creative?

Some images created by Midjourney (Source: Instagram)

Let us go over the human creative activities described above, and the description of the image generation process further above, and see where the potential for creativity comes in.

A) Experimentation: While the image generation programs of today don’t perform any experimentation on their own, they still get to experiment constantly because a large number of people keep using them to generate tons of images.

  1. Remixing: The image generation process I described above is essentially a remixing process. The final image that comes out can be thought of as a complex remix of multiple images from the model. This is why, for example, you can ask for “a tea cup with fur” and the system remixes some tea cups and furry creatures it has seen. It selects appropriate aspects of each, combines them in the desired way and generates multiple outputs that wow you. Today’s AI image generators are already simply astounding in terms of their ability to remix existing images.
  2. On the other hand, the AI image generators do not have their own experiences or emotions or inspirations (yet, anyway), so we can not give them any points for that. They have a long way to go here.
  3. At various steps in the image generation process, the system may have multiple alternatives that fit the constraints. In that case, it chooses the desired number of choices — at random. This can be thought of as a very simple form of random experimentation.

B) Curation: The training phase of the AI image generator can be thought of as the system “developing a taste”. This is because the AI system starts with no understanding of art or indeed the world. As it processes each image, it incrementally learns about it. For example, it learns what a cup is, what are its features, which features occur more often than not, and so on. At a higher level, it will even learn about various artistic styles and their nuances.

All of this knowledge is then used in the image generation process to “shape” the images being generated. One can say that the generator is performing curation within the constraints of the text prompt given to it.

So, as we can see, the AI image generators of today are already showing some level of creativity. They have developed a taste for art based on images they have seen, they are extremely good at remixing that existing art and curating it, and are doing some amount of random experimentation within that.

Any creativity you see beyond that really belongs to the human being coming up with good prompts. In fact, the skill of developing good prompts is now called Prompt Engineering.

So it is the prompt engineers who are imagining a picture in their head, with the composition, colors, textures, styles, and various other attributes they want, and hacking on the prompt until they get it right.

The image generator is essentially like a great apprentice who executes on the prompt engineer’s instructions really well, sometimes wowing them with their output.

Nevertheless, I would say that being able to demonstrate even this level of creativity (that too, for a V 2.0 product) is impressive. And we can absolutely imagine that this will get better in future versions.

Ok, so now that we have established that creativity isn’t all that mysterious, and even AI is capable of at least some aspects of creativity, the next natural question is, how far will it be able to go? Will it someday beat us in creativity also?

Some images created by Stable Diffusion (Source: Instagram)

Are There Limits to AI Creativity?

I already started describing some areas where today’s AI image generators are limited in their creativity. Let us look at them in a little more detail below.

A) Learning Constrained by Input Data:

As explained above, the AI image generators go through a training phase where they “learn” the associations between the images and their text descriptions that are fed to them.

This means that the “model” that they learn from that data is totally dependent upon what is in the data that was input. They can not learn anything “on their own”.

This also means that they learn all the biases that exist in the data. They do not understand that human beings don’t just want to perpetuate these historical biases, that they would like to move towards a more fairer and free world.

This implies that many of the images generated by these systems will not be appreciated or even considered acceptable in the modern world.

B) Limited Range of Experiences:

And it’s not just that their learning is completely constrained by input data. They have a bigger problem.

Most of today’s AI’s aren’t “embodied”, i.e., they do not have a physical body that moves about in the real world. (An embodied AI would basically be an intelligent robot of some sort).

Not having an independent physical presence limits the number and variety of experiences the AI can have. All the information about the world they have is what is fed to them via the training data. They can analyze and remix that data to their heart’s content, but they can’t break out of it.

A human artist, on the other hand, exists in the real world and interacts with many aspects of it on a regular basis. The physical world is extremely complex and full of extremely rich experiences, and all an artist needs is being present and sensitive to what is going on around them to get inspired.

Artists draw on these real life experiences while expressing their creativity. A lot of their inspiration comes from those experiences or those of other human beings, or even other living creatures or objects in the real world.

They aren’t limited to remixing existing art, but go far beyond that to remixing life itself.

So this is definitely an area where AI has a long way to go before it can truly challenge human creativity.

C) Lack of Genuine Emotions:

Some people have claimed that AI’s have become sentient and are capable of feeling genuine emotions. But in my estimation, this is an exaggeration. (Here is my post about it.)

In order to feel genuine emotions, the AI’s would need to possess something like an “inner life”. At present, they demonstrate no such activity. They are simply responding to user inputs, otherwise remaining essentially idle in their artistic abilities. (They may be doing some bookkeeping activities, but that doesn’t rise to the level of having an artistic “inner life”.)

This is not to say that it will never happen. I am in the camp that believes that some day, we will have sentient AI’s walking among us. But we aren’t there yet.

Needless to say that emotions are a big part of art, and human creativity has a big edge over AI in this area, at least for the time being.

D) Lack of Personal Initiative:

All the AI image generators of today simply respond to prompts given by users. While that allows them to demonstrate some amount of creativity, it stops short of being truly original.

Any “original” creativity they may appear to demonstrate comes from the originality of the person who imagined some image and created the prompt for it.

E) Limited Modality:

All of today’s AI image generators exist purely in the digital domain. They process digital images and output digital images.

But if you go to a typical art gallery, you will notice that many artists incorporate layers, textures, materials, and so on in their art. They add a third dimension to their creations.

At least for now, this is out of reach of AI.

F) Bugs and Incomplete Features

Just for completeness, let me mention that today’s AI image generators are still very much in early stage. They have some bugs or inadequacies in their outputs.

For example, many generated images contain weird aspects, or swirlies or other oddities in them.

It is not uncommon for artists to use the image generators to generate some images and then further edit them to take care of such issues.

Of course, we can expect many of these issues to be incrementally addressed in future releases.

Ultimately, the way to think about today’s AI image generators is that they are like digital artists on commission or apprentices who create art based on a specification from a human artist or client.

In order for AI to claim originality, it will have to imagine its own scenes and do its own prompt generation. This, again, isn’t impossible in the future, but we are just not there yet.

Photo by Nick Fewings on Unsplash

To Summarize

  • Yes, even today’s AI systems are capable of demonstrating some amount of creativity, in terms of remixing existing art in reasonably creative ways.
  • When one looks deeper into where creativity comes from, one realizes that it is a result of constant experimentation and tasteful curation, potentially driven by other existing art, or real life experiences, or emotions, or just serendipity. There is no mysterious origin beyond that.
  • AI systems of today do have some limits to their creativity, particularly when it comes to originality or emotionality or taking inspiration from real life. But there is no reason to believe, in principle, that they couldn’t reach the same level of creativity as humans in the future.

--

--

Vin Bhalerao

I write about science / engineering and their significance and value in our lives. | My book: “An Engineer’s Search for Meaning” (https://meaning.lifevisor.ai).