From Pixels to Paragraphs

How artistic experiments with deep learning guard us from hype

In recent months, a raft of art projects concerned with deep learning [1][2] have appeared, helping to popularize a branch of machine learning that has rapidly gained traction within the research community. Characterized by more complex architectures than previous generations of machine learning, large-scale deep learning methods have only recently become practical with modern computing power. As with all labels, no one can guess how long this particular one will stick, and even some of its proponents dislike certain claims made about it. Nevertheless, a number of algorithms associated with it have set new benchmarks in image, sound, and natural language processing over the last few years [1][2], prompting a flurry of speculations into its computational, commercial, and creative potential. [1][2][3][4][5][6]

Artistic applications of scientific research often start within the academic institutions that initiate them, as was the case for Andrej Karpathy, a graduate student at Stanford who wrote a blog post earlier this year, “The Unreasonable Effectiveness of Recurrent Neural Networks”, popularizing a type of neural network capable of generating text reminiscent of Shakespeare, Wikipedia, the source code of Linux, and anything else it happened to be trained on. Karpathy released his code, provoking a series of projects by others repurposing RNNs for everything from generating food recipes and bible verses to rendering fake emojis.

Google’s Deepdream

This was followed by the arrival of a number of software packages transforming images in novel ways, most notably Google’s Deepdream / Inceptionism project, as well as “A neural algorithm of artistic style,” a technique by which convolutional neural networks could recompose images in different styles. The unprecedented regenerative capabilities of these programs further pushed deep learning to a wider audience, leaving social media awash with pictures teeming with otherworldly creatures and psychedelic visuals, reimaginings of iconic paintings, and countless other machine-made artifacts [1][2].

A Neural Algorithm of Artistic Style introduced the technique of recomposing images in the style of other images, such as restyling this photo of The University of Tubingen (L) with van Gogh’s Starry Night (R)

More recently, a group of researchers published code demonstrating the use of generative adversarial networks [1][2] for modeling large image corpora and “hallucinating” convincing samples which appeared to resemble the originals.

One of the authors, Alec Radford, foreshadowed the impressive abilities of GANs by training them on a large dataset of faces and interpolating smoothly through the parametric space the GAN learned to represent. Radford’s experiments recalled the work of Heather Dewey-Hagborg, one of the first artists to examine what high-level information could be extracted from unordered atoms of low-level data. In “Stranger Visions,” she experimented with nascent DNA forensics by fabricating faces whose features were inferred from phenotypic analyses of genetic materials discarded by strangers in public places, calling to attention the public’s lack of awareness about how revealing bits of personal data could be.

The preceding works illustrated the capture and reorientation of visual information from text, pixels, and other minutiae. But it turns out those same pixels can literally “tell” us much more than that. Programs able to automatically annotate and even describe images in human language have made headway this year. A codebase by Ryan Kiros called neural-storyteller was adapted by Samim Winiger who trained it on romance novels and Taylor Swift lyrics and let it tell stories about a series of photographs, to humorous effect.

Similarly, Andrej Karpathy released code for real-time image captioning, and it was subsequently taken for a walk around Amsterdam by Kyle McDonald, casually confronting the public with a program displaying intelligent behavior not typically associated with simple webcams. To be sure, these early attempts still produce dubious transcriptions, but their steady improvement forces us to contemplate what we can and cannot eventually entrust them with as they increasingly exhibit competence for higher order tasks. Simultaneously, promising approaches to the reverse process — generating images from captions — are progressing rapidly and hint to a future where computers can autonomously exchange information end-to-end across different media. We need not strain hard to imagine the ramifications in a world whose accumulation of data is still accelerating.

How do such experiments help inform the public? Artistic works often suggest approachable metaphors for subjects which may otherwise be shrouded by layers of technical obfuscation, helping to illuminate the counterintuitive properties of these new techniques. Recontextualized, they show us how small bits of information in other domains become collectively meaningful. As with all successful technologies, these advances will gradually be absorbed into our existing infrastructures and institutions as they mature, forcing us to make crucial decisions along the way about how to implement them justly. The extent to which the public is aware of what these machines can — and cannot — do is the extent to which they can be regarded as trustworthy.

The number of open source deep learning libraries has dramatically increased over the past year [1][2][3], equipping amateurs with the same programs that are achieving state-of-the-art results in industry and academia. Numerous initiatives for demystifying machine learning and making it more accessible have popped up [1][2][3][4][5], including most recently OpenAI, a non-profit with over $1 billion backing from Elon Musk, among others, and a roster of well-known practitioners who have promised to promote transparency and collective benefit in their research. It seems the playing field has never been so level as it is today with respect to AI research. Yet despite these encouraging trends, we must proceed cautiously and thoughtfully as we know well by now that the history of computer technology has a familiar theme; public expenditure leads to rapid improvement of an emerging technology, whose faculties become gradually obscured from the public as they become commercially viable. This knowledge imbalance enables those with access to oversell its positive aspects and omit the potentially harmful use cases. As intelligent machines come to inhabit more and more of our daily lives, the stakes will only get higher. Another long stagnation should be the least of our concerns.