Generating “art” by Learning About Styles and Deviating from Style Norms
If we teach the machine about art and art styles and force it to generate novel images that do not follow established styles, what would it generate? Would it generate something that is aesthetically appealing to humans? Would that be considered “art”?
At Rutgers’ Art and Artificial Intelligence Laboratory we investigated a computational creative system for art generation that does not involve a human artist in the creative process, but does involve human creative products in the learning process. An essential component in art-generating algorithms is relating their creative process to art that has been produced by human artists throughout time. We believe this is relevant because a human’s creative process utilizes prior experience of and exposure to art. A human artist is continuously exposed to other artists’ work, and has been exposed to a wide variety of art for all of his/her life. What remains largely unknown is how human artists integrate their knowledge of past art with their ability to generate new forms. A theory is needed to model how to integrate exposure to art with the creation of art.
Our creative system is inspired by the psychological theory of art evolution proposed by Colin Martindale (1943–2008). He hypothesized that at any point in time, creative artists try to increase the arousal potential of their art to push against habituation. However, this increase has to be minimal to avoid negative reaction by the observers (principle of least effort). Martindale also hypothesized that style breaks happen as a way of increasing the arousal potential of art, but only after a artists exert other meanshave exhausted the possibilities within the roles ofa style.
Our creative system is inspired by Martindale’s principle of least effort and his explanation of style breaks. Our proposed agent tries to generate art with increased levels of arousal potential in a constrained way without activating the aversion system and falling into the negative hedonic range. There are several ways to increase the arousal potential. Our system focuses on increasing the stylistic ambiguity and deviations from style norms.
Deep neural networks have recently played a transformative role in advancing artificial intelligence across various application domains. Several generative deep networks have been proposed that have the ability to generate novel images to emulate a given collection. Generative Adversarial Networks (GAN) have been quite successful in achieving this goal. GAN learns to generate images through a game between two players. The first (called the discriminator) has access to a collection of images (training images). The second (called the generator) generates images starting from random. The discriminator tries to excel in identifying real images from generated ones, while the generator tries to excel in generating images that fool the discriminator into believing that they are real. Several variants of GANs have been suggested and were successful in generating images that imitate the training distribution.
However, GANs have no motivation to generate anything creative. Since the generator is trained to generate images that fool the discriminator into believing it is coming from the training distribution, ultimately the generator will just generate images that look like already existing art. There is no force that pushes the generator to explore the creative space.
Similar to Generative Adversarial Networks (GAN), the proposed network has two adversary networks, a discriminator and a generator. The discriminator has access to a large set of art associated with style labels (Renaissance, Baroque, Impressionism, Expressionism, etc.) and uses it to learn to identify styles. The generator does not have access to any art. It generates art starting from a random input, but unlike GAN, it receives two signals from the discriminator for any work it generates, which are contradictory by design. The first signal is the discriminator’s classification of “art or not art”. The second signal is a “style ambiguity” signal that measures how confused the discriminator is in trying to identify the style of the generated art as one of the known styles. The generator will use this signal to improve its ability to generate art that does not follow any of the established styles, and has an increased level of style ambiguity. On one hand it tries to fool the discriminator into thinking it is “art,” and on the other hand it tries to confuse the discriminator about the style of the work generated.
We trained the model on 80K digitized images of Western paintings ranging from the 15th century to the end of the 20th century. The model was successful in generating aesthetically appealing novel art. The generated images do not show typical figures, genres, styles, or subject matter. However, this is not because of its inability to do so. Simply, if we remove the “style ambiguity” signal, the model can in fact generate images that looks like portraits, landscapes, architectures, religious subject matter, etc. The model in this case is trying to emulate traditional art. Adding the style ambiguity signal forces the model to explore the creative space to generate novel images that differ from what it has seen in art history.
We approach this assessment from a visual Turing-like test point of view. We tested the degree to which human subjects would be able to distinguish whether the art is generated by a human artist or by a computer system. The critical issue is which human art we should use for such test. Since the goal of this study is to evaluate the creativity of the artifacts produced by the proposed system, we need to compare to art that is considered to be novel and creative at this point in time. If we compare the produced artifacts to Impressionist art, for example, we would be testing the ability of the system to emulate such art, and not the creativity of the system.
Therefore we chose two sets of works by real artists. The first set is a collection of Abstract Expressionist masters made between 1945–2007. This set was used in recent studies to compare human and machine ability to distinguish between abstract art created by artists, children or animals. The second set is a collection of paintings shown in Art Basel 2016, the flagship art fair of contemporary art.
Human subjects thought that the generated images were art made by an artist 75% of the time, compared to 85% of the time for the Abstract Expressionist collection, and 48% of the time for Art Basel collection.
In another experiment we asked the subjects to rate the degree they find the works of art to be intentional, having visual structure, communicative, and inspirational. The goal was to judge aspects related to whether the images generated could be considered art. We hypothesized that human subjects would rate art by real artists higher on these scales than those generated by the proposed system. To our surprise the results showed that our hypothesis is not true! Human subjects rated the images generated by the proposed system higher than those created by real artists, whether in the Abstract Expressionism set or in the Art Basel set!
It might be debatable what a higher score in each of these scales actually means. However, the fact that subjects found the images generated by the machine intentional, visually structured, communicative, and inspiring, with similar, or even higher levels, compared to actual human art, indicates that subjects see these images as art!
The outcome of this research was presented at the 8th International Conference on Computational Creativity, Atlanta, Georgia, June 19th -June 23rd, 2017. The paper can be accessed at http://arxiv.org/abs/1706.07068