Machine Learning En Plein Air: Building accessible tools for artists

A short story in two parts.

As both an artist and newcomer to the field of machine learning I am fascinated by what the technology can achieve yet somewhat overwhelmed by its complexity. Nevertheless, while exploring ML’s creative uses, and in considering its potential applications I found a helpful historical parallel. This inspiration comes from the explosion of creative innovation unleashed in the mid-1800s with the introduction of a new tool: the collapsible paint tube. Suddenly artistic experimentation became accessible to creative individuals previously shut out by their lack of training and connection to master craftsmen, ushering in a democratic new era in art history.

Part One: Painting outside

Until the mid-seventeenth century, at least in France, painting techniques and procedures were developed and taught in an almost cult-like manner, as methods were passed down in secret through generations of elite artists. Groups of masters essentially managed access to the craft, and taught their apprentices with esoteric writings, some of them actually called “Books of Secrets”¹, which were available only to the chosen few. One of these rare texts was “Les secrets de reverend Alexis Piemontois.” It contained guidance on various topics, including the crucial instructions for how a painter should prepare their pigments.

The secrets of Reverend Alexis Piemontois — (Paris, 1557)

Creating, mixing and maintaining paint was a complicated procedure that included grinding, mixing and drying pigment powders with linseed oil², and then storing them in a pig’s bladder sealed with a string³. Perfecting this technique often took years of training. Studio painting became the norm because the idea of painting outdoors, without all the necessary instruments and tools, was seen as far too cumbersome. The pigments alone, once created, had to be carried in individual glass jars, one for every color needed³. The fact that the colors were limited and the preparation was tedious vastly limited the depictions artists could portray and restricted their experimentation. Some artists managed to experiment with the challenges of outdoor painting, but on-site or en plein air painting was simply too impractical and complicated to be widely adopted.

But in 1841 an American portrait painter and inventor named John Goffe Rand came up with a simple invention that changed how artists painted forever. After a trip to Europe, he invented the collapsible paint tube: Made from tin and sealed with a screw cap, Rand’s collapsible tube gave paint a long shelf life, didn’t leak and could be repeatedly opened and closed.”³

This simple, yet very effective invention, allowed paints and oil pigments to become readily accessible to people. More colors became available since creating pigments was not a time-consuming task anymore and pigments did not dry out as quickly. All of a sudden, tool and techniques that had been difficult to access became mainstream.

John Goffe Rand collapsible paint tube patent drawings.

After Rand’s invention, outside painting or en plein air, was not only possible but encouraged. Renoir once said: “Without colors in tubes, there would be no Cézanne, no Monet, no Pissarro, and no Impressionism.”³ A new breed of artists was now able to experiment with new techniques, colors, themes, and depictions, in particular, painting the effect of natural light outside. Artists such as Monet, Pissarro, Renoir, Constable and J.M.W. Turner advocated for en plein air painting and experimentation.

Claude Monet Painting by the Edge of a Wood (1885) by John Singer Sargent. Oil on canvas. 54.0 × 64.8 cm. Tate Gallery, London.

Having better tools for painting, conceded artists more freedom to explore topics, escape the studio constraints and take inspiration from the natural world around them.


Modern Pigment Powders

Artists before the invention of the collapsible paint tube were experimenting with novel ways of creating paintings beyond the confines of studios. Fast forward 150 years, we now find new generations of artists trying to experiment with novel ways of using digital technology in their work. I like to think of these current attempts as the outdoor paintings of the XXI century. But just as their colleagues in the 1800s had trouble making and using pigments before collapsible paint tubes were invented without access to extensive and exclusive training, artists today are having trouble fully engaging with emerging technical fields and incorporating the latest tools and technologies in their work because few tools have been designed with them in mind.

One such field is machine learning, which uses statistical techniques to give computers the ability to learn or improve the performance of a given task without explicitly programming a sequence of instructions. Born from statistics and computer science, machine learning is becoming a fundamental technology in our society. But while commercial, social and political applications, such as speech detection and image recognition can be found everywhere, experimentation with machine learning remains, for the most part, confined to computer scientists and engineers. For an outsider, with no previous computer science experience, trying to understand and use modern machine learning techniques to explore its creative potential, feels a lot like trying to grind, mix and dry secret pigment powders with linseed oil just to make a painting

The secrets of Reverend Machine Learning

R Luke Dubois has an excellent quote about the role of artists and how they should engage with technology.

“Every civilization, will use the maximum level of technology available to make art. And it’s the responsibility of the artist to ask questions about what that technology means and how it reflects our culture. ” — R Luke Dubois⁵

But to make art using a maximum level of technology requires, at the very minimum, access to that technology. Considering that machine learning will continue to grow in complexity and segmentation, new ways need to be created to open up its potential to practitioners coming from other disciplines. Even beyond artists, anyone interested should be able to experiment with machine learning without having to build and compile low level and obscure C++ code. We need the machine learning equivalent of portable zinc tubes for everyone to use for experimentation.

Image the kinds of creative projects that could be built with algorithms trained to detect human poses in videos and images. If the Kinect unleashed a new wave of explorations for media artists, what will research like DensePose do?

What if more people could have access to research like DensePose?

Part Two: Building the right tools

I first approached machine learning because I wanted to build creative, weird and unexpected projects with this technology. That was one of the reasons I decided to come to NYU’s Interactive Telecommunications Program (ITP) almost two years ago. My first project using machine learning was for a class I took with Dan Shiffman. It was an app that narrated stories based on a combination of pictures using a pre-trained neural network that was able to generate captions. I then took that same model and, as part of another class with Sam Lavigne, made a tool to generate similar semantic scenes from a pair of videos. You input a video and get a scene that has a similar meaning in another video. I was fascinated with what just one machine learning model could achieve.

Scenescoop: A tool to describe and found scenes in videos [Cristobal Valenzuela, 2017] — https://github.com/cvalenzuela/scenescoop

I also had the the chance to collaborate with Anastasis Germanidis. Together we built a drawing tool that allows users to interactively synthesize street images with the help of Generative Adversarial Networks (GANs). The project uses two AI research papers published last year as a starting point (Image-to-Image Translation Using Conditional Adversarial Networks by Isola et al. and High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs by Wang et al.) to explore the new kinds of human-machine collaboration that deep learning can enable.

Uncanny Rd: Using GANs to synthesize a never-ending road [Anastasis Germanidis and Cristobal Valenzuela — 2017, http://uncannyroad.com/]

But one recurring thing I encounter over and over again while working with machine learning is something similar to the pigment problem described in Part One above: The tools seem too complex to even attempt to use and they fall under the “Just install [something]” assumption. The developers of the tools either assume everyone using the tools is coming from the same background or they require a lot of knowledge about low-level internal functionality. Thus, in the process of trying to use machine learning models and techniques, I found myself building tools in an attempt to simplify the underlying systems and at the same time help me comprehend the topic better. The effect has been twofold: I’ve been actively learning something and at the same time building a tool to abstract its complexity.

One of the tools I have helped build, tries to simplify machine learning for the web. ml5.js, is a JavaScript library, powered by tensorflow.js, that brings a friendlier interface to machine learning on the web. This is a project I have been helping to develop under the guidance of Dan Shiffman and with an outstanding group from ITP. The main goal of ml5.js is to further reduce the barriers between lower level machine learning and creative coding in JavaScript. Since JavaScript is rapidly becoming the entry point for a lot of new programmers, we hope it can also become the first entry point to machine learning.

I’ve also been experimenting with the possibility of applying some of the principles behind it to other creative frameworks and environments. A significant part of this effort has been aimed at trying to ask questions about the creative uses of machine learning in other creative workflows: For example, can modern machine learning techniques be used to support, generate or understand a creative process? How can a graphic illustrator benefit from a deep neural network that learns how to describe the content of images? What will a 3D animator do with a neural network trained to generate speech? Can a musician create sound from an algorithm that creates photorealistic portraits? In which ways should or could people with little experience optimally interact with machine learning? I’ve been exploring these kinds of questions as part of my ITP thesis project, Runway, and would like to briefly discuss some of the things I discovered and built so far.

But first: Models.

For the most part, machine learning consists of a series of steps designed to generate models out of data points. A model is a machine-learned representation of the input data. Thus, a model created to recognize faces will first need to be trained on datasets of many different types of faces. Once trained, it should be able to recognize faces from pictures it has never seen before. This model is a highly abstract representation of the input data that is now able to recognize faces. In reality the process is a bit more complex. It involves collecting a large number of data points, creating training, validation and test data sets, choosing the right architecture, selecting the right hyper-parameters and then training your model using, if possible, a graphic processing unit (GPU) for faster results.

People have trained and built models for many different purposes, including models to recognize emotions, objects and people, body positions, hot-dogs, spam and sounds; models to generate photorealistic faces, captions, streets, facades, voices and videos of airplanes morphing into cats; models to detect cancer, malaria and HIV. Some trained models are ethically dubious and raise important questions about the social impact of algorithms and biases in our society such as models used to segregate neighborhoods or predict crime.

Progressive Growing of GANs for Improved Quality, Stability, and Variation: http://research.nvidia.com/publication/2017-10_Progressive-Growing-of

But not all models are not created equally. People build models in different programming languages, with different frameworks and for different purposes. Regardless of how they are built, these models are the most utilitarian aspect of machine learning, the thing that really matters in the end. Models are what you can use to build a company around, create contemporary looking art and power search engines. Models are what machine learning is all about in the end. Models are the painting pigments of machine learning. They are the tools you use to create art and business. But most models, while still open-source, are kept sealed inside a pig’s bladder: they are hard to use, expensive and time-consuming to set up.

I’ve experienced the frustration of trying to learn about and use models ever since I first became interested in machine learning. To put things in perspective, a good example comes from academic research like Deep Photo Style Transfer, where the style of one image can be applied to another to generate realistic new images, which has a lot of creative potential. You can basically generate new images from a pair of two distinctively styled images. Imagine a Kubrick scene remade in the style of Wes Anderson. Or, the sketch of an artist being transformed through real-time image conversion into any artistic style or period she or he wants.

Deep Photo Style Transfer: The first column is the input, the second is the style and the third one in the generated output. — github.com/luanfujun/deep-photo-styletransfer

But since the nature of this publication is still in the realm of academic research, its “Basic Usage” requires the draconian computing process (for artists) of “matting Laplacian matrix using Matlab”:

Basic Usage Instructions for Deep Photo Style Transfer — github.com/luanfujun/deep-photo-styletransfer

For anyone interested in exploring potential creative uses of models and techniques like Deep Photo Style Transfer, such a complicated starting point can be overwhelming and deeply frustrating. Machine learning plays a role in our everyday lives, but it still closed to people who are not familiar with things like Laplacian matrices. I want to change this and create a shift. I would like to see modern machine learning models made accessible to more people without requiring prior experience in high level graph theory.

Runway

My ITP thesis project, Runway, is an invitation to explore these ideas. If models are the building blocks of machine learning, how can we create simpler tools to access them? What should be the requirements to run and use models like Deep Photo Style Transfer? If we could have modern zinc paint tubes for digital artists, what would they look like?

Runway’s main goal is to make the process of using a state of the art machine learning model as easy as possible. While learning about the data behind these models and the training process is important, this project is not about creating the right training environment to deploy models to production. It is not about training an algorithm and it’s not about hyper-parameters or hardcore data science. It is a project built around the simple idea of making models accessible to people, so they can start thinking of new ways to use those models. And from there on, have a better understanding about how machine learning works. A process of learning by doing.

Runway has 3 main components: Inputs, Models and Outputs.

In Runway, inputs are ways of triggering a pre-trained model to perform a certain operation. A model runs over the inputs and then outputs results. You can use these results in any way you want. Models are open-source and you can pick from a collection of trained models. Runway provides a library of models you can use any way you want. Runway allows users to connect the inputs or outputs of those models to other software.

Runway’s integration with other software

For instance, you can run a model trained to detect human body, hand, and facial key-points and send the results of a live webcam stream to a 3D Unity Scene, a musical app running in Max/MSP, a sketch using Processing or a website using JavaScript. Abstracting the correct and most efficient way of running a machine learning model allows the creators to focus on using the technology rather than trying to learn or set up the technology, effectively shifting their energy and time from setting up the tools to creating.


A few weeks before my thesis defense, and after developing a working beta version of Runway, I made an open call inviting anyone interested in trying the app. One of my main concerns was centered on understanding how would people engage with a tool like this or if they would find it useful at all. Fortunately, I got an overwhelmingly positive response. Not just from artists but also from designers, teachers, companies and people curious about machine learning in general. To make things manageable, I selected around 200 beta testers. And in just a couple of weeks, people started building interactive projects using state of the art machine learning models that may have been seen inaccessible before: Projects like building gaze and pose estimations in Unity or playing with object recognition in openFrameworks and in JavaScript.

Something I did not directly plan, was how useful the tool would be to machine learning educators. Since the lower level complexities of ML are being managed by Runway, educators can concentrate their time on explaining a model’s architecture, the importance of collecting data sets and discussing possible use cases. But most important, they can have their students build fully functional projects using machine learning.

Gene Kogan and Andreas Refsgaard using Runway to teach at the Copenhagen Institute of Interaction Design Summer School in Costa Rica

Runway is an invitation to artists, and others, to learn about and explore machine learning through more accessible tools. Machine learning is a complex field that will likely continue to impact our society for years to come and we need more ways of giving more people access. Just as an innovative technical invention allowed the Impressionists to discover en plein air painting and begin to explore and understand new and uncharted territory, perhaps with tools like Runway, we can usher in an era of en plein machine learning and with it, unleash the creative potential of this technology for the benefit of the arts, as well as society as a whole.

Runway will be available for free and it’s in beta now. You can learn more at runwayml.com


Many thanks to Dan Shiffman, Kathleen Wilson, Hannah Davis, Patrick Presto, and Scott Reitherman for their support in writing this post.


Notes

[1] From Books of Secrets to Encyclopedias: Painting Techniques in France between 1600 and 1800 — Historical Painting Techniques, Materials, and Studio Practice

[2] En plein Air — https://en.wikipedia.org/wiki/En_plein_air

[3] Never Underestimate the Power of a Paint Tube
 https://www.smithsonianmag.com/arts-culture/never-underestimate-the-power-of-a-paint-tube-36637764/#lsrgxhpI13tUTkzg.99

[4] A Technique of the Modern Age: The History of Plein Air Painting — https://www.artelementsgallery.com/blogs/gallery-blog/104238278-a-technique-of-the-modern-age-the-history-of-plein-air-painting

[5] Insightful human portraits made from data — R. Luke DuBois — TED Talk: https://www.youtube.com/watch?v=9kBKQS7J7xI