Computational Aesthetics: shall We Let Computers Measure Beauty?
As we all know, tastes differ and change over time. However, each epoch tried to define its own criteria for beauty and aesthetics. As science was developing, so was the urge to measure beauty quantitatively. Not surprisingly, the recent advancements in Artificial Intelligence pushed forward the question of whether intelligent models can overcome what seems to be human subjectivity.
A separate subfield of artificial intelligence (AI), called ‘computational aesthetics’, was created to assess beauty in domains of human creative expression such as music, visual art, poetry, and chess problems. Typically, it uses mathematical formulas that represent aesthetic features or principles in conjunction with specialized algorithms and statistical techniques to provide numerical aesthetic assessments. Computational aesthetics merges the study of art appreciation with analytic and synthetic properties to bring into view the computational thinking artistic outcome.
Brief History of Computational Aesthetics
Though we are used to thinking about Artificial Intelligence as a recent development, computational aesthetics can be traced back as far as 1933, when American mathematician George David Birkhoff in “Aesthetic Measure” proposed the formula M = O/C where M is the “aesthetic measure,” O is order, and C is complexity. This implies that orderly and simple objects appear to be more beautiful than chaotic and/or complex objects. Order and complexity are often regarded as two opposite aspects, thus, order plays a positive role in aesthetics while complexity often plays a negative role. Birkhoff applied that formula to polygons and artworks as different as vases and poetry, and is considered to be the forefather of modern computational aesthetics.
In the 1950s, German philosopher Max Bense and French engineer Abraham Moles independently combined Birkhoff’s work with Claude Shannon’s information theory to develop a scientific means of grasping aesthetics. These ideas found their niche in the first computer-generated art but did not feel close to human perception.
In the early 1990s, the International Society for Mathematical and Computational Aesthetics (IS-MCA) was founded. This organization is specialized in design with an emphasis on functionality and aesthetics and attempts to be a bridge between science and art.
In the 21st century, computational aesthetics is an established field with its own specialized conferences, workshops, and special issues of journals uniting researchers from diverse backgrounds, particularly AI and computer graphics.
Objectives of Computational Aesthetics
The ultimate goal of computational aesthetics is to develop fully independent systems that have or exceed the same aesthetic “sensitivity” and objectivity as human experts. Ideally, machine assessments should correlate with human experts’ assessment and even go beyond it, overcoming human biases and personal preferences.
Additionally, those systems should be able to explain their evaluations, inspire humans with new ideas, and generate new art that could lie beyond typical human imagination.
Finally, computing aesthetics can also provide a deeper understanding of our aesthetic perception.
In practical terms, computational aesthetics can be applied in various fields and for various purposes. To name a few, aesthetics can be used in the following applications:
- as one of the ranking criteria for image retrieval systems;
- in image enhancement systems;
- managing image or music collections;
- improving the quality of amateur art;
- distinguishing between videos shot by professionals and by amateurs;
- aiding human judges to avoid controversies, etc.
The backbone of all classifiers is a robust selection of features that can be associated with the perception of a certain form of art. In the search for correlation with human perception, aesthetic systems apply specific sets of features for visual art and music that are developed by theorists in arts and domain experts.
Image aesthetic features could be categorized as low-level or high-level plus composition-based. However, some research is based on features related to saliency (Zhang and Sclaroff, 2013), object (Roy et al., 2018), and information theory (Rigau,1998). The selection of features largely depends on the type of art and the level of abstraction, as well as the algorithm applied. For instance, photography assessment relies heavily on the compositional aspects, while measurement of the beauty of abstract art requires another approach assessing color harmony or symmetry (Nishiyama et al.,2011).
Low-level features try to describe an image objectively and intuitively with relatively low time and space complexity. They include color, luminance and exposure, contrast, intensity, edges, and sharpness.
High-level features include regions and contents as aspects that make great contributions to overall human aesthetic judgment and try to establish the regions of an image that seem to be more important for human judgment and find the correlation between the content and human reaction.
Composition-based features differ for photography and artwork and may include depending on the form of art a range of features, such as Rules of Thirds, Golden Ratio (Visual Weight Balance), focus and focal length, ISO speed rating, geometric composition and shutter speed (Aber et al., 2010).
Similarly to image analysis, music aesthetics assessments try to combine research in
human perception and cognition of basic dimensions of sound, such as loudness or pitch and in higher-level concepts related to music, including the perception of its emotive content (Juslin and Laukka, 2004), as well as performance specific traits (Palmer, 1997) to develop a comprehensive set of features that would be able to assess a piece of music.
In 2008, Gouyon et al. offered a hierarchy organized in three levels of abstraction starting from the most fundamental acoustic features, to be extracted directly from the signal, and progressively building on top of them to get to model more complex concepts derived from music theory and even from cognitive and social phenomena:
Low-level features are related to the physical aspect of the signal and include loudness, pitch, timbre, onsets, and rhythm (e.g., see Justus and Bharucha, 2002).
Mid-level features move to a higher level of abstraction within the music theory and cover tempo, tonality, modality, etc.
High-level features try to establish a correlation between abstract music descriptors like genre, mood, and instrumentation and human perception.
Methods and Algorithms
At its broadest, we can speak of computational aesthetics as a tool to assess aesthetics in visual art or music and as a means to generate new art.
For aesthetics assessment, various algorithms have been proposed over the past few years based either on classification or clusterization.
There are a number of algorithms that are extensively used to assess image aesthetics by means of classification. Among the most popular are AdaBoost, Naive Bayes, and Support Vector Machine, and substantial work is also conducted using Random Forests and Artificial Neural Networks (ANNs).
AdaBoost in computational aesthetics is a widely used method that is believed to render the best results. It was first offered in 2008 by Luo and Tang who conducted a study on photo quality evaluation, with the unique characteristic of focusing on the subject. They utilized Gentle AdaBoost (Torralba et al., 2004), a variant of AdaBoost that uses a specific way of weighting its data, applying less weight to outliers. The success rate obtained was 96%. However, when Khan and Vogel (2012) utilized their proposed set of features for photographic portraiture aesthetic classification, the accuracy rate with the multiboosting variant (multi-class version) of AdaBoost fell to 59.14% (Benbouzid et al., 2012).
Naïve Bayes is another popular method that was used in the same study by Luo and Tang (2008). In 2009, Li and Chen utilized the Naïve Bayes classifier to aesthetically classify paintings in which the results were described as robust. The success rate achieved utilizing a Bayesian classifier was 94%.
Support Vector Machine is probably the most wide-spread algorithm for binary classification in computational aesthetics. It has been used since 2006 when Datta et al. studied the correlation between a defined set of features and their aesthetic value, by using a previously rated set of photographs and showed up to 76% of accuracy. Other studies that rested on the same classifier include Li and Chen (2009) who aesthetically classified paintings; Wong and Low (2009) who built a classification system of professional photos and snapshots, Nishiyama et al. (2011) who conducted a research on the aesthetic classification of photographs based on color harmony, and others, with an average accuracy rate of about 75% and higher.
Random Forest, though usually showing lower results as compared to Bayesian classifiers or AdaBoost, were used in a number of studies of photograph aesthetics. For instance, Ciesielski et al. (2013) achieved a 73% accuracy to assess photograph aesthetics. Khan and Vogel (2012) utilizing their proposed set of features for photographic portraiture aesthetic classification, achieved an accuracy of 59.79% by making use of random forests (Breiman, 2001).
Artificial Neural Networks (ANNs) rendered extremely good results when used with compression-based features by Machado et al. (2007) and Romero et al. (2012). The former research aimed at the identification of the author of a set of paintings and reported a success rate from 90.9% to 96.7%. The latter work used an ANN classifier to predict the aesthetic merit of photographs at a success rate of 73.27%.
Convolutional Neural Networks (CNNs) are state-of-the-art deep learning models for rating image aesthetics that have been extensively used in the past few years. CNNs learn a hierarchy of filters, which are applied to an input image in order to extract meaningful information from the input. For example, Denzler et al. (2016) applied the AlexNet model (Krizhevsky et al., 2012) on different datasets to experimentally evaluate how well pre-learned features of different layers are suited to distinguish art from non-art images using an SVM classifier. They report the highest discriminatory power with a Network trained on the ImageNet dataset, which outperforms a network solely trained on natural scenes.
Image clustering is a very popular unsupervised learning technique. By grouping sets of image data in a particular way, it maximizes the similarity within a cluster, simultaneously minimizing the similarity between clusters. In computational aesthetics, researchers use K-Means, Fuzzy Clustering, and Spectral Clustering in image analysis.
K-Means Clustering is widely used to analyze the color scheme of an image. For instance, Datta et al. (2006) used k-means to compute two features to measure the number of distinct color blobs and disconnected large regions in a photograph. Lo et al. (2012) utilized this method to find dominant colors in an image.
Fuzzy Clustering is a form of clustering in which each data point can belong to more than one cluster, therefore it is used in multi-class classification (see, for example, Felci Rajam and Valli (2011)). Celia and Felci Rajam (2012) utilized FCM clustering for effective image categorization and retrieval.
Spectral Clustering is used to identify communities of nodes in a graph based on the edges connecting them. In computational aesthetics, a spectral clustering technique named normalized cuts (Ncut) was used to organize images with similar feature values (Zakariya et al., 2010).
A separate task of computational aesthetics is to generate artwork independently from human experts. At present, the algorithm that is best known for directly learning the transformations between images from the training data is Generative Adversarial Network(GAN). GANs automatically learn the appropriate operations from the training data and, therefore, have been widely adopted for many image-enhancement applications, such as image super-resolution and image denoising. Machado et al. (2015) also used GANs for automatically enhancing image aesthetics by performing mainly tone adjustment.
Conclusion: Restrictions and Limitations
Aspiring to reach objectivity, research in computational aesthetics tries to reduce the focus to form, rather than to content and its associations to a person’s mind and memories. However, from a psychophysiological viewpoint, it is not clear whether we can have a dichotomy here or whether aesthetics is intrinsically subjective.
Besides, it is difficult to ascertain whether a system that performs on the same level as a human expert is actually using similar mechanisms as the human brain and, therefore, whether it reveals something about human intelligence.
It might be that in the future we will rely on machines in our artistic preferences, but for now, human experts will dictate their opinions and try to get machines simulate their choices.