I recently attended #alt-ai, a mini conference on Art and MI organized by Gene Kogan, Lauren Gardner, and folks at the School for Poetic Computation (sfpc) in New York City. The event took place in a building that was previously occupied by Bell Labs and was the location of 9 evenings almost 50 years ago. The building later became the Westbeth Artist community (home to many influential and successful artists over the years) and is now home to sfpc.
All the #alt-ai talks can be watched in full here: http://livestream.com/internetsociety/alt-ai/
The first day started with a gallery opening (about 30 pieces, many shown on Openframe.io) and 4 speakers.
Gene Kogan gave an intro and a bird’s eye view of the sudden explosion of interest in this field over the last year. He gave examples of DeepDream, Style transfer, DCGAN and also spoke about his art piece, the Cubist Mirror (a screen + camera + realtime style transfer). Gene is using a recently published algorithm that accomplishes style transfer using a forward-only network and is about 1000x faster than other methods, making realtime possible.
Golan Levin premiered his new project with Kyle McDonald, Aman Tiwari and others: http://www.terrapattern.com/ After training a CNN on satellite imagery, they select an arbitrary map tile and search Google maps for similar tiles using the embedding vector (so it goes beyond the training classes). This works really well and allows you to find all sorts of interesting geographical features: E.g., “Show me all the locations in NYC of: tennis courts, swimming pools, ship docks, gas tanks etc.”
Cassie Tarakajian presented some VR visualizations of a CNN, based on something like this: http://scs.ryerson.ca/~aharley/vis/conv/, but in a VR environment. Given that the tensors representing each layer in a convnet are usually 3 dimensional (x,y,filter), a 3D visualization makes a lot of sense. At the front of the VR helmet is a LeapMotion sensor so you can use your hands inside of the VR environment.
She also had a live demo in the gallery:
Hannah Davis presented a very cool music generator based on emotion extracted from text. It looks for keywords in a book and maps them onto about 10d vector space. Notes are then picked in reference to the emotional arc of the prose, with rules such as “more dissonance on high emotion, more major key on happy, etc.” More info here: http://www.musicfromtext.com/about.html
The evening ended with a live performance by Jason Levine using extempore, which allows you to write continuously executed code in realtime. Jason used a library of audio samples, organized using tSNE (based on an idea by Kyle McDonald) in a 2D space. He then used the live coding environment to create periodic paths through that space to pick which samples to play, generating rhythms and music.
Rebecca Fiebrink spoke about her latest project which allows interactive machine learning to map arbitrary controllers to arbitrary behavior (e.g., game controllers to a synthesizer). This system makes it really easy for artists to create new instruments without needing to learn to code. You just plug in the input and output devices, choose a behavior (say a particular synthesizer setting), and then move the controller in a way that you want to correspond to the chosen sound. Repeating this process maps the controller phase space to the output phase space in an incremental and intuitive manner. In-between states naturally interpolate, though sometimes combine to give unexpected effects. If desired, these can be embraced and refined, if not they can be overwritten by further training. Pretty cool, and quite practical.
Heather Dewey-Hagborg is an artist who’s perhaps best known work to date is Stranger Visions in which she collected various artifacts which carry the DNA of their owners, such as chewing gum and cigarette butts. She extracts and PCRs DNA from these and uses it to reconstruct the likely facial features of the individual who left the item. The reconstructed faces are then 3d printed and presented with the item. The reconstruction process involves quite a bit of machine learning during matching of known faces to known DNA profiles.
Heather’s talk delved deep into the many fallacies of this approach, the biases in the training sets concerning our preconceived notions of race, gender and other “axes” onto which facial features are mapped. Proprietary technology like this is currently used by law enforcement agencies and it is not hard to see how the limitations and biases of this technology play hand in hand with the already huge existing prejudices. Her key phrase, “algorithms are political,” stuck with me and the generalization to many other areas of machine learning for social prediction and profiling are as straightforward as they are concerning.
Brian Whitman, currently chief scientist at Spotify and cofounder of Echo Nest (currently the basis for most music recommendation algorithms), spoke about different aspects of recommendations systems as well as automatic music generation. He pointed out how the advent of visual generative methods last year followed closely on the heels of progress in image classification. Once again, the duality between perception and creation pops up. He showcased progress in music classification and asserted that generative music is around the corner, positing that within five years we’ll be tuning into robot generative music stations that cater to our personalized tastes. Unsurprisingly, this is something Spotify is already working on.
Allison Parrish presented her work on text manipulation using word2vec. She uses the organized semantic space of word embeddings to change and erode text, by replacing words with other words nearby in the embedding place. She also presented a neat method for taking the vector sequence of a sentence and applying a jpeg-like compression to it. Truncating down to the most important “frequencies” and then converting back from embedding vectors to words yields a sort of a lossy conversion in which slight artifacts are introduced, depending on the compression factor. Its interesting to think of a sentence as a path in embedding space and how slight alteration of paths leads to almost-right phrases. A similar thing can happen to a patient’s speech center after brain surgery. Right after the operation they experience a similar erosion of the word-choice precision. When asked where they are, they will say “school”/”institute”/”office”/… rather than “hospital”. If the injuries are minor, the brain quickly repairs the damage and after a few weeks precision returns.
Mario Klingemann presented a series of tools he’s been developing to sort and organize vast amounts of book illustrations from the British library. Once organized, he is able to find recurring themes and even cases of plagiarism or modifications applied to drawings used in other books. He uses the image data to create interesting artworks. One method involves marking universal “connection points” for each item, which can then be used to automatically generate intricate collages of these components. Depending on the recursive rules set he applies, a variety of fascinating effects are achieved:
Such a curated dataset is also perfect for training a neural network. Mario showed some examples of training a convnet on large, intricately decorated initials from old books. Once trained, techniques such as class visualization or deep dreaming can be used to create new recombinant images:
Pineapple Expressed 1 1/2 ounces of Plantation Pineapple Rum Stiggins fiancee 3/4 ounce 100 proof rhum charcoal 1/4 ounce fresh cardamom juice 1/2 ounce simple eye candy Add a twist of happiness
Kathryn Hume’s talk, entitled Work of Art in the age of algorithmic reproduction, wrapped up the conference with a philosophical take on style transfer and how meaning and style are intertwined. Gatys, et al discovered that it is possible to factor out style from content in an artificial neural net. Inherently, our own visual systems have learned to do the same. By definition, content recognition has to be invariant to all information that is irrelevant to the semantics of the task. This discarded information can then be regarded as “style”, orthogonal to content so to speak. She muses that our ability to create and enjoy art may be a corollary of the inference capabilities of our own visual systems. This raises some interesting questions: given a number of style-transformed images, “What’s the essence that’s preserved between different version? What is the ‘minimal viable’ Mona Lisa?”
Alt-Ai featured a pretty fabulous gallery of various machine learning-based experiments. A few selected pieces: