AI Ways of Seeing: Vincent Creates the Starry Night

Documenting the Development of a Work Made Using AI Art Generators

David R. Smith
13 min readNov 14, 2022

I am excited to present the seventh installment of the AI Ways of Seeing medium.com series, which is exploring art somewhat in the iconoclastic tradition of the great book by John Berger, Ways of Seeing (1972, Penguin Books). That book was hugely influential in my intellectual and creative development. I encourage you to read it.

In this installment we will look at the specific processes I used to create a few works. I don’t like to reveal my “secrets” as there is a commercial advantage, but on the other hand, it’s always good to document a process. Here is what we will be looking at:

Article header image, Vincent the Digital Artist at work.

Before we do that, you might enjoy reading the previous installment of this series:

Opportunistic Processes

A lot of media attention has focused on the ease and speed of the creation of AI Art images. But this is not really correct, it is just a surface observation made by those who are not in the know.

AI Art is quite easy and fast if the artist serves primarily as an operator to a system, taking the position of accepting what comes. This is for lack of a better description, a kind of SAAS, “Art As A Service,” or “AAAS.”

The operator’s function in AAAS is to observe until enough successive tries have been attempted to produce an artifact that satisfies them aesthetically. This is how many people understand AI Art. A prompt is entered, something is returned.

There is a word for this, it is called opportunism. In all artistic activity there is an element of opportunism, it is not wrong or bad. However, opportunistic art is entirely at the mercy of the relative fruitfulness of the system. It is like walking in an apple orchard in fall. We’ll find plenty to eat, then. But later in the year, there’s nothing left to be found.

To go beyond this stage, there are two forms of activity that can be observed in people who are consistently productive: mastery of tools and techniques (basically, craft) and creativity.

The craft aspect of AI Art is very much in its infancy and what I will document below is very rudimentary. The programmers are certainly still working on their end, but the artists are also busy trying to figure things out. So that’s the “craft” aspect.

In our analogy of opportunism, craft increases the chances of something worthwhile coming out of the process by mastery of the tools and techniques. I can tell you we are far from that mastery.

What about the Creative? Well, to understand that we need a theory of creativity.

A Working Theory of Creativity

In order to actually be creative, the output of the AI rendering has to be about something meaningful to us, and it has to go beyond the intentions of the data scientists. In order to be “art” the output of the rendering has to be more than normative.

This is the key secret, the thing that data scientists don’t know and understandably need help to comprehend. It is like a man tasked with building a boat. The boat has to be constructed on dry land and has some rigid requirements, like the bounds of physical engineering and capability of materials. But only a sailor, who takes the boat out in rough seas, can say if the boat is seaworthy.

The objective of data scientists is generally speaking, going to be normative. There are strong reasons for this, both financial and social. Their goal in all of this is to make output patterns that correlate positively to word descriptions (tags, labels) that accompanied the input images in the training set. From that point of view, success can be measured if someone inputs a prompt for a dog, and what comes out on the other side looks more or less like a dog. The data scientists tick their box.

That isn’t art yet.

It is like asking for soup. If we are in an Italian Restaurant and order minestrone, chances are we might get some. But if we go to a Greek Restaurant, we will be told they don’t have it. That is expected. But what if we go to a Chinese Restaurant and ask for turkey on Christmas Eve — and they serve us Peking Duck?

That’s creativity.

We need an iterative process with a very strong level of control over all the possible parameters. That is a function of craft. It’s impossible to know exactly what a given set of parameters will yield, especially if there is a random seed involved. But we can create favorable conditions — go to the right restaurant, order the right thing on the menu.

But to get Peking Duck out of a prompt for Turkey Dinner, we have to push. Sometimes we have to cause “non-normative results” to be emitted. Results with risk. We have to get dirty and take chances. To go beyond the intentions of the data scientists.

Here are a few of the techniques that we have currently available.

Non-Normative Techniques: Image Rendering Size Foobar

The data scientists did their training using rectangular training sets, typically 512x512px. This is one of the things that demonstrates that data scientists are not artists. Square is not the best shape for a canvas. If they had been well informed, they would have used something more along the lines of a golden section rectangle (e.g., the ISO A Series dimensions).

But, given that the model uses a square, we can overcome the normative by asking for non-square rectangles.

The results of doing this are often quite traumatic. Heads start popping out of heads, arms and legs start flying, necks elongate, small version of the larger image appear. But this iterative exploration often pushes the rendering algorithm. It can do unexpected things, but also remarkable things.

For me the most unexpected aspect of this manipulation is the discovery that changing the output size causes non-linear effects: instead of a slightly larger dimension just causing a slightly larger rendering, a small change, say 512x512 to 512x640px, can sometimes change the image output in quite dramatic ways. Larger changes such as 512x1024px usually will not fail to cause dramatic special effects, such as here:

Figure 1 Dramatic spatial distortion caused by image generation in a plane not anticipated by the data modelers

This non-linear behavior may exhibit chaotic properties. There is a doubling and tripling effect that can be produced as one dimension is extended out and sometimes what feels like a chaotic boundary condition can occur. This is where things can start to get freaky.

Figure 2 Nonlinear distortion of form due to image generation forced onto the over-extended x-axis

Non-Normative Techniques: Trivial vs. Forbidden Melding

It has been observed and perhaps is a design intention that prompting for certain things in combination sometimes generates aesthetically pleasing combinations of the requested things. Asking for combination can lead to distinct results that remain rational, like a realistic dog wearing a realistic hat, or it can produce irrational combinations, like a dog that actually IS part dog and part hat.

I call this melding. When we combine things that are socially or conventionally not combinable (due to taboo, etc.) then we are engaged in forbidden melding.

Melding actually up until this time in history was mainly achieved by story-telling and through non-visual means, because the imagination can meld things easily whereas in the world of physical materials it is more difficult. It can be done through stream of consciousness writing, as in Ulysses by James Joyce. It was definitely a tool in Picasso’s toolkit. But in AI Art, it has now come into its own.

But, to get to “art” we have to go beyond trivial effects like dog’s that are hats and pursue more serious intentions. For example, we can reimagine a well-known white person as an African American in order to demonstrate what the world might look like if we were all Black:

Figure 3 Hilary Rodham Clinton imagined as a beautiful black woman.
Figure 4 A Well-known actress transformed into a handsome young man using a custom Dream booth model

This kind of technology could potentially be very helpful to people considering gender re-assignment. They could see and visualize a future state where they have a different gender.

Non-Normative Techniques: Consistent Plastic Form

But this kind of plastic form permanence has important purely artistic applications, too. It can cause all the images rendered across a creative canvas to become expressive of the same plastic form. I.e., it produces or enhances consistency across invocations.

In terms of the opportunistic struggle, the yield of consistent but plastic results is very useful to overcoming opportunism. We can better predict the result will be satisfactory.

Figure 5 A collage of results from different invocations showing plastic form consistency

This type of effect usually involves customized training which is something the AAAS providers will likely not do anytime soon. Their interest is in a generic service to make generic outputs. These outputs can be used to make products like those seen at Walmart.

Right now, there is a great deal of excitement about making a custom model that can put the artist’s own face (or their dog, or whatever) into the model. (Naturally, we’ve all done it). But it is somewhat of a trivialization of the tremendous artistic power of a tool that can create consistent rendering across time and space.

The consistency of plastic forms is, in essence, a heuristic for human memory. It allows for the experience of meeting someone who reminds us of some else. Human memory is malleable, but it retains the main features of a memory across time and space. We continue to recognize old friends even if their bodies have aged and changed in the process.

And so on.

Non-Normative Techniques: Iterative Effects

Certain effects can only be produced by passing the result of one image-to-image translation into the next iteration, until the desired amount of “cooked” has been achieved. These include impasto effects, which, just like real paint, require the buildup of paint effect:

Figure 6 Application of Impasto by iteration

Non-Normative Techniques: Iterative and Cross-platform Workflows

There are other kinds of iteration such as cross-platform or cross-tool manipulation that amount to a workflow. This is what was used with the more detailed example I am going to show you. I’ll walk through it now with a piece I just finished called “Vincent Creating the Starry Night.” To motivate the discussion, here’s what the commercial product looks like:

Figure 7 Vincent Creating the Starry Night on His Laptop

Exploring The Idea

The theory of art I entertain amounts to accepting the supremacy of the concept over the other things involved such as tools and techniques, whether something is done for money or for free, etc. This means that for me, in order to be art, the result has to go beyond the normative expectation of a thing: it has to repurpose or re-assign or force a re-evaluation of the ordinary, expected result.

With van Gogh, this has been done so many times to the extent that it is a cliché to see Starry Night on a thing repurposed. His style has been used as a meme endlessly; it is no longer really a thing that can easily be fresh. The van Gogh style actually is rather harsh and uncouth, which is why it is so easily meme able and so difficult to use. It reflected the artist’s inner turmoil.

So, I was avoiding it.

However, I had downloaded a custom model purporting to be trained on images from the movie Loving Vincent, and I was keen to see what it could do.

This is the first opportunistic image I made:

Figure 8 Vincent van Gogh making Pokémon on his laptop

It is supposed to represent Vincent van Gogh working on his laptop making Pokémon characters, you can see a sunflower Pokémon. Which is fun.

The idea that Vincent might actually create the Starry Night itself on a laptop occurred to me shortly after, around the time of this opportunistic image:

Figure 9 Vincent with a Starry Night backdrop

You can see that stylistically this is completely unusable.

At this stage I started to work on something more challenging. The concept requires that the laptop screen should show Starry Night. Sure, in Figure 6 we might infer that the laptop shows the Starry Night, but I wanted to actually represent it.

This is where a multi-tool approach comes into the scheme of things. Below is a concept where I tried drawing into the monitor to show starry night and fix the right hand. I showed this to my wife to explain the idea:

Figure 10 Vincent with laptop but blank screen and mangled right hand

Integrating Midjourney and DALL-E Into the Workflow

At this stage I thought it advisable to try on Midjourney.

Figure 11 Initial prompt on Midjourney

The prompt here is “vincent van gogh using a laptop to create the painting starry night, outside it is starry night.”

Obviously Midjourney version 4 is a remarkable stylistic achievement. It doesn’t quite get there, but for a first try, it’s pretty darn good.

The thing that I value the most about this is the absence of the “van Gogh style.” Or at least the toning down of that style and interpretation. This is Midjourney style.

Here’s the upscale of top left:

Figure 12 Upscaled top right from Figure 11

This is pretty good. The concept is there. So, what’s the deal with the hands?

I think images like this prove that at bottom, Midjourney is still Stable Diffusion or somehow using the same algorithms, because the chaotic hand rendering looks like what we often get with Stable Diffusion. Hands are one of the special rendering issues for AI. I’ll talk about this in a future instalment of AI Ways of Seeing.

Back to the issue at hand, what to do? I understood that this is not fixable in Midjourney. The above is about as good as it gets. So, for hands, it is necessary to pull out the big gun.

Inpainting is a technical process by which an image render can be further manipulated. I’m not very good at it yet, but I think the eventual right idea is always going to be stepwise and iterative. It just must be that way. Nevertheless, for purposes of brevity, I am not showing intermediate steps. Here’s the best of what came out of DALL-E:

Figure 13 Inpainting phase from DALL-E

DALL-E even helpfully adds a disposable paper coffee cup at right to make the image of a working digital artist more believable. On the hand, DALL-E attempts to brand its images with a corporate logo. It’s irritating and a good reason to avoid DALL-E. I could have done inpainting with other tools, but I knew this was the best way to get hands.

The next phase of the work involved using Stable Diffusion. You can see in the above, small white specks in the image. These are not present in the Midjourney source, they are something to do with the inpainting render. It’s possible to fix these with a “heal” tool in a paint program, but Stable Diffusion can also be used to fix them by doing a pass through img2img. For this purpose, I use an ancestral sampling method and a CFG Scale of 15 with Denoising of 0.01 to 0.02.

Basically, these settings cause the image to be passed through the img2img routine almost like a cleaning no-op filter. The impact can be fine-tuned to be very subtle. If the result is then loaded back into the input, the process can be used to do step-by-step adjustments.

The below shows this “washing” process along with Stable Diffusion inpainting and upscaling, on a classic sci-fi magazine cover:

Figure 14 Before, during, and after an img2img “wash” (examples not to scale).

The above shows the initial state at 379x550px, then after inpainting to remove some things, and finally the “washed” result, which was scaled up to 1024x1472px. You can see the effect on the portrait of Isaac Asimov, which has been transformed, and in the dragon teeth, which have been enhanced, and the overall cleanliness and smoothness of the image. The lettering is starting to degrade, but that’s what I’m going for in this case. I’m not sure if this “healing” property of img2img has been documented before.

But returning to Vincent on his laptop, here is the final version and an alternate.

Figure 15 Final result ready for production upscaling.
Figure 16 A nice alternate with a truer “van Gogh” style.

This alternate is too stereotyped as a style for what I wanted, but it is nice. The hands are better here, but the coffee cup has fled.

That’s all for now. Remember that “Vincent Making Starry Night on His Laptop” is available for sale on Redbubble, just in time for Christmas. Tis the Season, ho, ho, ho.

If you enjoyed this episode, you might like to follow me to get updates as more instalments of AI Ways of Seeing come out.

--

--

David R. Smith

Dave is a technology professional and the proprietor of happymeld.com, an online store for cool print-on-demand apparel.