Subtractive Adversarial Networks
On creativity and constraints
As I was preparing a presentation for an online meet-up the other day, a stray thought on the nature of creativity occurred to me, and while I won’t claim that it is super profound, I think it speaks to the nature of creation, and an aspect of the process that may at times be overlooked.
*For more M.C. Escher: M.C. Escher — The Graphic Work
One aspect of this blog that I’ve enjoyed is extension of logical arguments to analogous demonstrations in music or art (albeit at times of only loose relation), sort of in the vein of Gödel Escher Bach (or at least a poor man’s attempt thereto). Pulling from this thread, for some reason the Beach Boys masterpiece Pet Sounds album comes to mind with this maxim. The work is notable for not only the creative brilliance — paving the way for the likes of Sgt Peppers Lonely Hearts Club Band, but also because of the incorporation of a whole range of instrumentations, layered vocals, even at times literally the sounds of pets. Wilson took the medium of the pop album and fought the constraints of convention for a deeper and layered work, not only an early example of the Wall of Sound recording technique, but also a progression of rock music from the realm of a concert hall to the more deliberate creation process of the studio.
The music of Pet Sounds is both deeply satisfying and yet somehow jarringly abrupt in the arrangements. The songs here are mostly under three minutes, and somehow included into these short pieces are progressions in intensity easily on par with Otis Redding at his best.
Part of me wonders what it might look like for some future machine learning implementation to extract the essence of each of these tunes for a generative extension to some more respectable arc. After all if some metal-heads from the 90’s can take a whole 10 minutes of airplay with November Rain shouldn’t Brian Wilson’s masterpiece deserve similar treatment? Consider that it has been demonstrated that machines can extract the characteristic features of a piece of art for either extension beyond the original borders of display or potentially overlay onto a stylistically dissimilar piece.
I believe some of the features of image data that make it feasible for these type of manipulations even with current technologies include it’s bounded grid topology, narrow range & dimensionality of pixel values (RGBxy), the meaning often embedded in adjacent spacial configurations, and huge corpus of training data available for training of convolutional neural networks. The modality of music presents some different challenges, and my expectation is that extending these type of transformations to that field may involve incorporation of recurrent neural network architectures for time series data (noting that this blog has previously addressed the potential application of convolutional architectures for time series data as well).
Creativity, unlike logic for instance, is one of those categories of thought where I think the general expectation is that humanity will continue to have a advantage verses our algorithmic progeny for at least the foreseeable future. However there are signs even today that implementations of machine learning are gaining the capacity to create at least on par with those artists amongst us. Generative adversarial networks (GAN) have demonstrated the potential for generation of novel realistic images representative of characteristics found in source images used in training of the algorithm. A category of algorithm I haven’t seen yet though is a subtractive kind of creativity, such as a GAN that creates novel representations only via removal of features from a source image. Although I’ve never had an editor for these essays, I expect the work of editing includes the identification of passages or language that distract from the core message of a work. Similarly in the modality of images, there certainly must be cases for some features of a photograph for instance that detract from the features that are of interest to the photographer or intended message. I suspect such an algorithmic capacity might be possible with the use of adversarial techniques for building such a new class of image editors. Such a Subtractive Adversarial Network (SAN) would primarily identify features of a source image that detract from the desired characteristics, whether by reframing the image to a reduced window or by adding negative space in place of features of distraction. An extension of this technique could be built to infill such negative space with more representative features of the desired characteristics.
Part of the challenge of creating some new work — whether music, art, or literature — arrises from the near infinite range of output. Just like the old marketing example of when a grocery shopper enters the breakfast aisle and is faced with hundreds of brands of cereal, there may be a kind of paralysis by choice in an artist. In my experience a solution to this obstacle of creation is to impose some artificial constraints that pair down the the range of options. Such a constraint in music may be the requirement that every great crescendo be followed by the song’s conclusion, in art perhaps a limit to color palette or brush strokes, in literature may be the rule to only publish that which you can finish writing in at most one week. The point is that by limiting the range of output with some artificial constraint, you reduce that computational cost of searching your fitness landscape, and yes while this tactic may strike the global optimal point from your output, there will likely still exist some point that satisfies the conditions even if not optimum — better to have a good enough solution than no solution at all. There’s no way to answer this question with certainty, but I wonder if Shakespeare, writing in iambic pentameter, could have matched his output without the artificial constraint of his form. Constraints beget creativity.
Considering the Shannon entropy of text, if we incorporate additional rules to the grammar, such that any statement must have two meanings (such as a double entendre), the range of possible values for each subsequent letter or word in a statement will be reduced, which suggests that the redundancy of language goes up — in other words given an excerpt from a passage it will be easier to predict the next part of a statement with each additional constraint applied. However, if you were to take that same statement, the double entendre that is, and try to translate it into a second language, say as if translating from English to Spanish, the amount of letters or bits required to sufficiently describe the entire range of meanings for the passage will have gone up, meaning the Shannon entropy will be higher than for some statement with only one meaning. Claude Shannon discussed in his paper Prediction and Entropy of Printed English that constraints such as additional grammatical rules applied to a text will increase redundancy and parallel reduce the associated Shannon entropy, however I think the double entendre / double meaning of a passage or work is an example worthy of additional consideration because it appears to increase both redundancy and entropy.
An author that I admire, David Mitchell, wrote his work Cloud Atlas with a kind of poetry in form that was novel to me and I think at least on par with the creativity in structure of Hofstadter’s Gödel Escher Bach. The story is told as a collection of tails in progression of the ages, each in a unique voice, with time jumps of decades or centuries between from pre-industrial society to a post-apocalyptic realm. The stories carry themes of the timeless dance between those in power and some underdogs under repression, with echos between of dialogues, points, and theme. Each story is told as a crescendo ending in some cliffhanger drop off, and it’s not until a reader reaches the midpoint that they start to find a reverse progression of the tails through time, each story’s counter a satisfying conclusion to the series.
*For further readings please check out my Table of Contents, Book Recommendations, and Music Recommendations.
Books that were referenced here or otherwise inspired this post:
Gödel Escher Bach — Douglas Hofstadter
M.C. Escher — The Graphic Work
Shakespeare Complete Works — William Shakespeare
Cloud Atlas — David Mitchell
(As an Amazon Associate I earn from qualifying purchases.)
Albumes that were referenced here or otherwise inspired this post:
Pet Sounds — The Beach Boys
(As an Amazon Associate I earn from qualifying purchases.)
For further readings please check out my Table of Contents, Book Recommendations, and Music Recommendations.