Nick Foster and Simone Rebaudengo
In 1929 a small guitar company by the name of Vega produced and launched a portable valve amplifier to pair with their line of banjos, releasing musicians from the large, static PA systems which preceded them. It’s hard to know exactly what happened next, but a succession of experimental combinations of valves, voltages and hardware followed until the early 1940’s when Fender launched what we now recognize as a guitar amplifier, to huge success.
Amplification is simple in principle, taking a signal produced by the vibration of a string, and increasing its amplitude (hence ‘amplifier’), resulting in a louder sound. All amplification circuits have limits defined by the performance of the components within the circuit, beyond which the system may be considered ‘overdriven’. An overdriven waveform becomes truncated or ‘clipped’ as it exceeds these limits, resulting in a sound filled with buzzing, popping and crackling, which we refer to as ‘distortion’.
In the early days of amplified guitar music, this distortion was viewed as undesirable, a symptom of poor sound engineering, but in the 1950’s a few performers began experimenting with this overdriven, distorted sound during recording. In 1951 Jackie Brenston recorded his single ‘Rocket 88’, which featured a jangly four bar blues riff played through a distorted amplifier. This distorted sound captured the rule breaking attitude of Rock ‘n Roll and became a foundational sound of the era. Countless artists began to ape this distorted sound, not only with guitars but with vocals, creating wild overdriven recordings like those found on 1958’s chaotic ‘Love Me’ by Jerry Lott. The distorted guitar became such a signature of Rock ‘n Roll that artists began to push the sound even further. For his 1958 hit ‘Rumble’, Link Wray slashed the speakers of his amplifier to generate a raw, snarling, distorted sound, which was so effective in whipping teenagers into an uncontrolled frenzy that the song was banned on the radio, despite having no lyrics.
The rest, as they say, is history. The distorted guitar sounds of early Rock ‘n Roll led straight through the Velvet Underground, the Kinks and the Who, to the Sex Pistols, Minor Threat and Nirvana. A symptom of electronic failure became the defining audio landscape for generations, the ‘bug’ became a ‘feature’ and distortion settings are now commonplace on every available guitar amplifier. The initial aim of amplification technology was to generate an accurate reproduction of an acoustic guitar sound, only louder. In reality the technology morphed and altered the input to create new, previously unheard sounds which captured the imagination of musicians and fans.
The intervention of technology into the creative arts typically delivers new aesthetics, as individuals push and pull at its edges and play with its shortcomings. The explosion of creativity brought about by the birth of photography was led in part by exploring the ‘errors’ in capture and processing. Likewise the synthesizers and drum machines of the 70’s and 80’s, (which did a pretty poor job of recreating ‘real’ sounds), birthed everything from disco to Miami booty bass.
So, let’s talk about Machine Learning.
It’s hard to move anywhere within the worlds of technology or enterprise without encountering Machine Learning or ‘Artificial Intelligence’, as they’d have us say. These approaches to computing have rapidly made their way from research labs and white papers to finance and data analytics organizations, helping identify previously unknown patterns and ultimately helping make decisions. It’s no surprise that industries with raw numeric data were the first to find significant utility in this field, but in the sphere of visual culture and media, Machine Learning is also making great strides.
Image recognition and image creation have played a key role in the advancement of Machine Learning technologies, due in part to the availability of vast arrays of well indexed image sets. These large data repositories allow for the creation of advanced models, and the results of this research are impressive indeed. The artificially generated images of people found at thispersondoesntexist.com or the mushrooms, dogs and cats created via BigGAN certainly tread the uncomfortable tideline between real and synthetic.
In video too, new machine learning techniques are emerging with alarming regularity. Horses can be made to look like zebras, and movies may be reframed depending on context. Deepfakery - the act of replacing faces of individuals or actors in video - has generated hundreds of breathless headlines, in part due to the tempo with which this technology is maturing, but also the potential cultural impact of this content.
Music has become a rich playground for experimentation perhaps due to its comparatively basic structure and comparatively small data size. Companies such as Mubert and Endel have leveraged Machine Learning techniques to produce on-demand generative music which purports to be indecipherable from that composed by a human. (side note: Endel has recently been acquired by Warner Music).
In all of these examples, there is a common theme: a focus on perfection, of a pixel or note perfect reproduction of the real world, or at least a world so believable that it feels ‘real’. As these processes are being developed however, they stumble and fumble en route to this ‘perfect’ state, arriving with glitches, artifacts, blips and smears.
Deep Dream was an early standout project in the field of machine generated images, which (to use shorthand) is a Convolutional Neural Net run in reverse, where images may be processed to reveal strange new half creatures, hypnotic architectures or landscapes, depending on the training set. Below is a Deep Dream image broadly referred to as the “puppy slug”, a result of the neural network over-interpreting what it saw based on what it knew (which happened to be puppies).
For a few months this project gained significant interest. Twitter feeds pumped out fractal architecture, fantastic landscapes and melting puppies eyes, offering new insights into the way these systems worked. In time this interest waned, and Deep Dream is now broadly viewed as a parlor-trick distraction, albeit an important one, from the real business of recognizing or generating photorealistic images.
The work of artist and Obama speechwriter Ross Goodwin is worthy of mention here also. In 2016 he trained an LSTM Recurrent Neural Network on the screenplays of 80’s and 90’s science fiction movies, and asked it to write a screenplay of its own. The resulting work retained a recognizable air of the original source material yet felt mostly garbled, nonsensical and absurd. In order to fully explore this outcome, Goodwin and his collaborators followed through with the screenplay, filming a short movie which delivered the text directly as written in Sunspring, which retains an Edward Lear-esque feel as the actors attempt to navigate the SciFi echoes delivered to them through the text. The piece was, however, widely used as an example of just how poor Machine Learning techniques are as creative tools, due to the absurd, non-linear and comparatively ‘primitive’ outcome.
Similarly, in 2018 Philipp Schmitt and Steffen Weiss created a set of images of iconic chairs which they used to train a generative system to create new works. The images created by their algorithm are just barely recognizable as chairs, but these half-images were used to create drawings and models of new pieces of furniture, some absurd, some un-makeable but mostly un-sittable.
It’s easy to dismiss these projects as failures, or as primitive stepping stones on the way to a more ‘perfect’ model. Advertising, cinema, graphic design and photography communities continue to develop powerful new approaches, almost all of which are focused on generating images, music or writing which is indecipherable from the ‘real’.
But what if this focus on accuracy, detail and photorealistic trompe l’oeil is a trap?
Take another look at those chair images. There’s something about them which feels so rough, so gnarly, so distorted. They’re engaging and raw, otherworldly and new. The rough edges are part of the attraction, like if Kurt Cobain knew Tensorflow.
Thankfully, a small group of designers, artists and researchers are taking time to dwell in this distorted and malformed space rather than rush through it to the assumed perfection on the other side.
For the past two years, the German artist Mario Klingemann has been sharing his process and discoveries using Generative Adversarial Networks (GAN’s). Perhaps the most interesting aspect of his work is expressed through his tweet-narrative, which occasionally reads like the diary of an explorer of unknown lands and creatures. As one of the first creators to ‘domesticate’ BigGAN, he reports from his travels in this virtual and multidimensional space by showing snapshots of ‘pot-lands’ or ‘latent cul-de-sacs’. His attempts to find Mona Lisa in latent space, or attempting to make a paper collage with a neural space of monkeys are two projects of significant note.
In one of his recent blogposts, the science fiction writer Robin Sloan described a concept of ‘expressive temperature’ in his experiments with machine learning and written language.
“In machine learning systems designed to generate text or audio, there’s often a final step between the system’s model of the training data and some concrete output that a human can evaluate or enjoy. Technically, this is THE SAMPLING OF A MULTINOMIAL DISTRIBUTION, but call it rolling a giant dice, many-many-sided, and also weighted. Weird dice.”
By playing with the weights of this ‘weird dice’, Sloan is able to control and augment the weirdness of what it’s generated, perhaps dynamically, to evolve and distort the output.
“The goal is not to make writing “easier”; it’s to make it harder. The goal is not to make the resulting text “better”; it’s to make it different - weirder, with effects maybe not available by other means.”
There’s clearly a precedent here, but what we’re observing is a willful tweaking of Machine Intelligence systems to find and then break their edges, just as Jackie Brenston did with his guitar amplifier. Systems intended for perfect reproduction or synthesis are being explored, messed with and teased into generating new, undefined, and hitherto undesirable outcomes.
Limitations in available computing power, the associated cost of training a model, and poor source material, have all played a key role in the emergent aesthetic of Machine Learning. The downsampled, low resolution JPEGs which form the basis of many training sets typically exhibit a host of compression artifacts which we have trained ourselves to ignore. The machines have no such judgement, so train themselves on every pixel with equal weight, and as such these artifacts are often augmented in the synthesized content, producing glowing halos around faces or objects. There are evident and obvious similarities here to the Hip Hop pioneers of the Bronx, who used dusty funk recordings to produce their sampled art form, hence the crackles and pops present in those original recordings became a distinctive element of their work.
We perhaps have a little way to go before the visual language of this experimental work becomes codified, but we’re approaching a point of aesthetic consolidation where an experienced viewer can clearly recognize ‘styleGan-ish’ or ‘pix2pixel-ish’ outputs. The fluid, childish watercolor aesthetic of semantic images synthesis, (created by the insecurities of the neural network in interpreting human input) is rapidly becoming its own aesthetic, and as viewers, we are also developing a sensitivity, and perhaps an attraction to it, an evolution of the New Aesthetic as described by writer and artist James Bridle.
In 2019 Mario Klingemann (mentioned previously) collaborated with Albert Barque-Duran, who, in a deliciously full circle act, painted images generated by a neural networks trained with oil on canvas, in a project appropriately named “My Artificial Muse”. Likewise the 2018 portrait of Edmond de Bellamy (recently auctioned for $432,000) exhibits the hallmarks of an undertrained or overdriven GAN to great aesthetic effect.
Music videos are often the first landing place for exploratory aesthetics, so it’s hardly surprising to find examples at the fringes of popular culture, from artists such as Lord Over or Sitraka Rakotoniaina (who worked with Tony Njoku to map and morph his face through a custom made dataset of faces of color). Lead vocalist of Rammstein, Till Lindemann used a ‘walk-through’ technique, following a vector in a GAN dataset to create dark and distorted fluid videos in this video to match his uniquely Neue Deutsche Härte sound.
Individuals are beginning to pull at controls, add noise and generally play in the margins of these black box algorithms, using noise and distortion in Machine Learning practices as a means of inspiration, to create something unexpected rather than defined solutions. So what could this do for architecture? What is ‘overdriven’ product design? What can you ‘dial up’ to explore clipping in graphic design? What does it mean to add distortion to a chair?
We should expect an explosion in this type of grass roots experimentation in the coming months and years, as Machine Learning technologies and interfaces make their way into the hands of more users. ‘Human-comprehensible’ tools such as Runway or ML5JS and the availability of open source models and datasets on Github and Magenta are allowing many more people to experiment with these tools, leading to an aesthetic consolidation. Standing on the cusp of this accessibility event is exciting indeed.
Artificial Intelligence (or AI) has become the industry standard term for this work, but we increasingly feel that this term is limiting, and may guide us down a creative cul-de-sac. Think about artificial flavors and artificial flowers, those things are always poor imitations of the ‘real’ thing, and as such will always fall short. They have their own unique characteristics, but ‘artificial’ will always tend towards comparison with a superior ‘original’. We prefer the term ‘Machine Intelligence’ as it allows these technologies space to breathe, to be unencumbered from comparison, and allows the technology to show its own grain, its own vagaries and find its own direction.
As we try to focus machine learning on creating ‘perfect’ copies of the real world, or expect it to become a direct simulation of human intelligence we open up an uncanny valley. The Roland 808 will never make sounds like a Ludwig drum kit, but that doesn’t really matter, it’s its own thing. Far from limiting guitar music, the technological boundaries of early amplifiers expanded its reach exponentially. We should embrace the strange artifacts of Machine Learning and allow them to develop, rather than aim to remove them. We should enjoy the distortion that comes from an under-trained model, just as we do from an overdriven guitar.
Where might that take us?
What might we create?
What might we learn?
— — — — — — — — — — — —