Can AI make an Art Introduction?

Kinar R
YML Innovation Lab
Published in
6 min readJun 6, 2019

With recent advancements in Artificial Intelligence, we at YML Innovation Labs were curious whether art introductions can be automated. In this article we explore how we went about creating such an art exhibition in NYC with Damjanski.

It’s not too long since OpenAI cautiously revealed the capabilities of its language generation learning model GPT-2, taking the task of performing language modelling to the next level. While knowing that it performs well on a number of language benchmarks, everyone’s been talking about it, so we decided to take a look at it and see what we could conjure up something for the art world. This was part of a kick-off with the artist Damjanski to create a solution to showcase statements of an art exhibit named Museum of Modern Art, generated in response to questions answered by folks coming to the art performance. This project investigates ideas of collaboration with an AI and its integration into the artist’s practice. Exploring the subject of thinking processes is an AI based on all the archived exhibition statements of the Museum of Modern Art (MoMA) in New York. The exhibition comprises of a headset where people can interact with the AI to create new exhibition statements in the form of printed text. These statements serve as a means for artists to think about the generated information in new ways.

The Exhibition Catalogue (source)

The exhibition statements were available in the exhibit’s portal so we decided to scrape them and use it. Names of artists appearing in any statements had to be anonymized for legal reasons. In the end we were able to gather 2000 statements which was just about enough for model to learn to a certain amount of extent with fine-tuning.

Preprocessing

Replacing names of artists

No major preprocessing is required for GPT-2. In other words, lower casing, tokenization or any other step isn’t really necessary as authors believe that these steps restrict the capability of the model, thereby allowing it to evaluate all the various language model benchmarks. So, a part of the data that needed cleaning were the names of artists. Stanford’s CoreNLP core having a Named-Entity Recognition module made it possible for us to do this and also relying on deterministic co-reference resolution (dcoref) to understand the context of gendered pronouns in a text and suggest appropriate replacements that match the genders of the names.

Architecture

The Transformer (source)

GPT-2 is inspired from the Transformer architecture proposed by Google in their paper “Attention is all You Need”. Essentially, a Transformer provides a generic way to detect dependencies between inputs and outputs in an encoder-decoder fashion. At its core, the number of sequential operations to relate two symbols from input/output sequences to a constant number of operations was reduced. It achieves this with a multi-head attention mechanism that allows to model dependencies regardless of their distance in input or output sentences. Also, every step in the model is auto-regressive, implying that symbols that were generated previously will be taken into account when generating the next.

(source)

As it might appear surprising for some, GPT-2 does not require any task specific training. It uses an unsupervised learning approach to train the language model. Unlike other models such as ELMo and BERT that need 2 stages of training which are pre-training and fine-tuning, just feeding raw text to the model was sufficient as it uses Byte Pair Encoding (BPE) for vocabulary. In our implementation, we decided to employ the smaller 117 million parameter unsupervised model (117M) which is roughly 1-tenths of its large scale distributed variant.

(source)

We decided to create a micro-service out of the fairly large model with TensorFlow Serving due to its sheer out-of-the-box integration with TensorFlow models and model versioning. It is a flexible server architecture known for its capability in serving high-performing machine learning models, designed for production environments. Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving requires the model to be exported to a Servable compatible format, which in our case was a SavedModel. So, mapping the signature definitions for GPT-2's inputs such as context tokens, temperature and associating a tag to the SavedModel was enough to deploy the model in a GPU-accelerated instance via TensorFlow Serving’s REST interface.

You could give it a headline or maybe an excerpt of a composition and it should be able to write the rest of the article, complete with quotations. In our case, it was the transcribed audio spoken into the headset’s microphone. As you can notice, although its grammar and spelling are generally correct, the writing can usually be identifiable as non-human while being fairly coherent.

A sample exhibition statement generated by the system

This micro-service was consumed by another service hosted by Damjanski where the AI generated text would then take the place of the body in a fixed template of an exhibition statement already complete with a title, medium, and not to forget the closure in the end.

The statement generated before post-processing (left), the fixed statement (right)

The generated text had a decent variety in its length, about 50 to 700 words at that. Despite having good results, the text had its own fair share of issues, with incomplete sentence beginnings and ends. For this, we had to leverage the Punkt Sentence Tokenizer in NLTK to extract all the sentences appearing in paragraphs of the text and then later reconstruct the output by cropping out the leading and trailing sentences. We set out to push this change so as to get a sense of completeness in the text being generated.

On a side note, the creators of OpenAI have decided to give temporary access to the original large scale model of over a billion parameters to a small number of media outlets like the Verge and the Register for them to evaluate the text being generated. While they did release a part of the code to the public with the smaller pretrained model to see what people could come up with, they decided not to share the dataset nor the source that was responsible for training the model in full-scale. This was done with the hope of avoiding any actors, be it good or bad from inadvertently or potentially deliberately attempting to build AIs for malicious use. It could certainly prove disastrous if it were to fall in the wrong hands as it may very well be used by a perpetrator to generate fake news.

I believe this is a step forward in automated text-generation while also spelling trouble in ways that could be devastating if used in the wrong way.

References

--

--