The Origin of Creativity: Rote Adaptiveness
‘Rote Adaptiveness’, the capability of automatically being adaptive seems like an oxymoron and is also not as well investigated. But this capability is critical to General Intelligence.
How can a skill that is performed without comprehension be one that is also adaptive? This sounds counterintuitive, yet we see this all the time in the field of software development.
Decades ago, it was well understood that software development should not be executed like a factory floor. Instead, software development is more like a discovery process. Framed in this way, we invented new processes to accelerate this discovery process. Good software development involves the automation of discovery. Thus rote adaptiveness is not a vague idea. There are plenty of processes that improve navigability in spaces of the unknown. Navigation is an apt metaphor for rote adaptiveness. Effective navigation demands good tactics to avoid walking in circles.
The motivation why I find rote adaptiveness interesting is the realization that any collective intelligent whole (i.e., a being with a brain) must have adaptive parts to operate robustly and effectively. But the parts themselves cannot be intelligent otherwise, we have an infinite regress.
So the key conclusion is that we must build a library of tactics that lead to greater adaptiveness. Tactics that can be performed without the need for comprehension. Tactics that non-intelligent parts can perform.
How does one build a system that is competent in addressing uncertainty? At a minimum, you need something that can learn (without comprehension). This learning system must then train in contexts of complex adversaries. Does this not remind one of a Generative Adversarial Network (GAN)? An adversarial system has both discriminators and generators.
Said differently, we can call them decoders and encoders. In recent months, GANs have been surpassed by Diffusion models that are orders of magnitude more efficient. Diffusion models are structured like autoencoders (i.e., an encoder followed by a decoder). The technology piece of a Diffusion model is that its decoder is an Ordinary Differential Equation (ODE) that can be solved by many numerical methods developed in the past decades. It requires no training! It’s one of the best examples of rote adaptiveness.
Diffusion models have triggered an explosion of AI art in the previous few months of 2022. What is striking to many of its users is how the generated images reflect emotions that previously GANs never did.
Where does the source of creativity and novelty of AI image generation originate from? I’ve got an explanation.
When I first played with Dall-E 2, its creative limitations were apparent.
My skills are getting better at nudging #dalle2. Before and after:
It helps to have a good vocabulary. Expressions of the pose, shot angle, camera lens type, atmospheric effects, body shape, lighting, style of render, etc. It’s fascinating how much you can blend words. Artists with a rich art vocabulary will be good at this. It’s still a lot of trial and error and Dall-E doesn’t make it easy to work on intermediate work. Creating image variations function isn’t conditioned by text, so it diverges too much from a user’s original intention.
You play around with it long enough to realize its novelty and creativity limits. The issue I have when working with Dall-E is that it’s often consistent when generating the entire image but is imprecise when you attempt to repair images. Its limitations appear to be related to many constraints it needs to balance. But for whole image, it balances well.
It’s an odd way to investigate these artificial intuition systems by having them generate images. It’s all a kind of negotiation where you nudge them while having to accommodate existing constraints in the image and your prompting.
You have to do a lot of trial and error; generating new images is like rolling the dice. Sometimes you do get lucky but it is a fascinating game that’s being played.
I estimate that you need at least 20 generations ~ $2.60 to get something that is interesting to build on. At present, there’s still a lot of skill required to conjure up AI art at the level of traditional digital artists.
Dall-E2 uses just one kind of trained network. I can see that others will combine this with other neural networks (i.e. styling, repair, etc) can lead to a powerful suite of compatible tools that work well in concert. MidJourney and StableDiffusion UI allow you to use upscaling and face repair functions.
Meanwhile, I hope you are enjoying my “artificial people of the future” collection! None of these images are created on a single generation in Dall-E 2. I have to roll the dice several times, pick from the many generated and then outpaint and repair sections. It’s more reliable with outpainting than ‘in painting’.
Also, Dall-E doesn’t do style transfer on entire images. This is unfortunate that often you like the look rendered in one style but prefer it in another. I like the creativity of the rendered image below, but it’s impossible to recreate it like a real photograph.
3d rendered images have greater creativity, perhaps as a consequence of the greater diversity of images generated in this style. Real-world photographs don’t, but I can’t render creative mixtures into real-world photographs. It’ll be wonderful if I could!
Fortunately, StableDiffusion has an image to image feature that is very handy in nudging images to a certain style. Here I’ve re-rendered the above Dall-E image to look more realistic.
It seems to me that images using CGI have greater details. Note the loss of detail on the neck. What is needed is semantic transfer across styles. I know this can be improved on and we should get there soon.
When I finally tried the stable diffusion upgrade of Midjourney. Although being a smaller network than Dall-E 2, I was stunned by the novelty and creativity that emerged out of the platform: Midjourney Showcase
How is this platform ( which restricts the upload of images, does not allow inpainting, and only allows the system to generate variations of existing prompts) generate rich diversity and novelty? How is this possible?
Perhaps we can look at a similar evolutionary system ArtBreeder. This system combines features of existing creations to generate new ones. There’s little user control of the output : https://www.artbreeder.com/beta/browse?modelName=paintings
It’s the selective process of thousands of users that evolves the images to kinds that are richer and more intricate. People are serving as curators of the system such that images that are novel and creative from the human curators’ perspectives are those that emerge.
Every image has an evolutionary history, and each fork in directions to maximize human novelty and creativity. It is the humans that originate this, the machines simply provide a repeatable mechanism to this creativity engine.
StableDiffusion and Midjourney are initialized with an image training set selected for aesthetic value. This is enough to serve as a good foundation for incremental improvements.
The accumulated wisdom of the crowd is further baked into the system. Midjourney incremental improves on its engine that drives its unique style. StableDiffusion’s open-source nature drives technical innovations such as Textual Inversion, allowing users to tweak the style.
But just as the selective process is a consequence of humans who participate in the system (but are not the system’s software). Creativity and novelty always originate from the external. The other two sources are the discovery of good prompts and external images.
These systems are artificial fluent systems, good at regurgitating new language expressions. However, not all expressions have meaning, but the kind that does originates externally. None of these systems can emerge novelty and creativity without interaction with the outside.
The inescapable truth is that we are novel and creative because we interact with reality’s greater whole. AI image generators are a mirror of us. They appear to almost living things because they reflect our human choices.
As we become immersed in working with this artificial fluent system, we realize a fundamental truth about individualism and uniqueness. That both don’t exist absent the interaction with the collective we.
To summarize, creativity springs from two sources, that work in synergy with each other. The first is rote adaptiveness, language is an example of a rote adaptive technology. Their rendering in writing and the ubiquity of literacy expands the community of its users. Human civilization would not exist in its form in the absence of language, writing, and other technologies that widespread distribution of words, such as the printing press and the internet. Rote adaptiveness allows for the re-combination of constituent parts to create new meanings.
The second source is the community of users of the rote adaptive technology. Tools like Dall-E and MidJourney are examples of a rote adaptive technology. It automatically blends languages and images without comprehension of what they are. It’s the users of these tools that select for meaning. A Darwinian process refines the fit of the medium to its users. The content that survives and propagates are those that fulfill a niche by its adopted users.