Automating Aesthetic Judgement
N.B: This is a conference paper I wrote back in April. Some of the references to a fast-moving area of computer science are therefore already somewhat dated, however the broader theorisation may still be of interested. It’s part of a broader project into the ‘realistic’ that I aim to pick up next year once I submit my PhD thesis.

You’re likely wondering if you’re supposed to know who the woman above is, since I’m presenting her photo with no clear context. You may already be uncomfortable about the ethics of this presentation: does she know I’m using her image? Am I tacitly inviting you to make some kind of judgement about her appearance? However, this is an image of a person who does not exist and has never existed. It’s neither a photograph nor a photoshop, but an original image produced by a “Generative Adversarial Network”, a relatively recent technique in machine learning.
If you’re judging this face at all, it’s now probably about the degree to which it’s “realistic” — a term that would not come to mind regarding a “real” photo. Even if you do judge it realistic, looking this “fake” person in the eyes can be unsettling. But why?
Approaching media technologies from a humanities/literary studies background, I’m interested in how they register on ostensibly non-technical levels of experience. In this paper, I’ll consider the ways Generative Adversarial Networks, or “GANs”, might affect how and why we judge things “realistic”. I’ll begin by defining that term, “realistic”, as a category of “aesthetic judgement”. I’ll then outline what GANs are and what they do. Finally, I’ll suggest that the growing use of GANs could open up a new meaning to the term “realistic”.
Aesthetic Judgement

Eighteenth-century German philosopher Alexander Baumgarten coined the term “aesthetics” as the study of aistheta — or “objects of sense” — rather than of noeta — or “objects of thought”.¹ In his work on aesthetics, Immanuel Kant observes that descriptions of things as “beautiful” or “sublime” derive from one’s subjective experience, yet nonetheless masquerade as universal. So, if I call a painting “beautiful”, then I assume you also find it beautiful. Kant argues that this is because such judgements — if made sincerely — are “disinterested”.² I don’t fit the painting into some pre-existing conceptual framework in order to appreciate it. It already seems to be unified by itself. Aesthetic judgements are therefore ways we conceptualise and communicate about objects that seem to affect us from outside our conceptual thought.
Aesthetic philosophy has long betrayed somewhat pompous, Eurocentric assumptions about the “beautiful” and “sublime”. More recent work has documented how these judgements are structured and restructured through social institutions and customs (and the power relations imbedded within these). There has also been greater focus on more “minor” categories of aesthetic judgement, such as “cool”, “cute” and “interesting”. Lacking the haughtiness of the “beautiful”, these categories locate aesthetics beyond the gallery and, as Sianne Ngai writes, highlight the — quote — “continuousness and everydayness of our aesthetic relation to the often artfully designed, packaged, and advertised merchandise that surrounds us in our homes, in our workplaces, and on the street”.³
The Realistic
One prominent aesthetic category today is the “realistic”. This word drifts from its etymological relation to philosophical realism in late 18th-century Germany, when the Romantic poets Schiller and Goethe used realistisch to criticise someone too caught up in the unimportant details of reality to grasp ideal truth and beauty. However, with the ascendance of bourgeois art and politics across Europe, “realistic” became a positive term to judge both artworks and people as variously humble, pragmatic and revolutionary.⁴

In today’s Western culture, we judge artworks “realistic” in various ways. In my ongoing research, I have come group these under three loose, overlapping subcategories. Very briefly: “experientially” realistic artworks look or feel like something real, but need not represent real things; “representatively” realistic artworks appear to adequately reflect the diverse details of reality, avoiding stereotypes to “reveal” something often overlooked; finally, “intuitively” realistic artworks depict series of events that seem plausible, whether in reality or according to the implied laws of a fictional world.

In each case, “realistic” marks a representation’s ability to affect us from outside rational thought. The events in a film “seem” realistic, like they “could” happen. But I can’t test that in that moment. I can’t know for certain. I judge something realistic in the absence of certainty, disguising my vulnerability and the limits of my knowledge as agreement. I judge something “unrealistic”, meanwhile, to assert that I remain unaffected by it.
From the camera obscura to the photograph, artists have long turned to technologies to make their work more “realistic”. In the early-Twentieth Century, Walter Benjamin wrote that photography produces in us an “optical unconscious”, allowing us to dissect and reproduce the more subtle, minute details of the visual world.⁵ Today’s “machine learning” algorithms similarly extend our capabilities. They identify patterns in large quantities of recorded data that have not been yet “structured” by a human. They therefore make expanses of “big data” useful without a human — or a team of humans — attempting the impossible task of sifting through them.
Operating at speeds and scales beyond the capabilities of human cognition, these algorithms can appear to the layman as exhibiting a mysterious agency, watching and manipulating the invisible structure of reality in unseen ways.
Generative Adversarial Networks (GANs)

Invented by Ian Goodfellow in 2014, Generative Adversarial Networks have rapidly improved in their capacity to produce realistic images. Consider, for example, the GAN-generated images of cars above.
That’s not to say all GAN-generated images are perfect, however.

GANs consist of two “neural networks”: a “generator”, and a “discriminator”. Initially, the generator sends a random projection of data to the discriminator. The discriminator compares this with an existing data-set to produce a probability score as to whether the generated data is “real” or “fake”. The generator is then optimized to produce data that returns a better probability score. Meanwhile, the discriminator is also tweaked to become more discerning. The two therefore compete: the generator produces better images of cats to fool a discriminator that is constantly improving its general knowledge of what a cat looks like.

N.B: You can see this at work in a basic form using Georgia Tech’s GANLab (the green dots represent existing data from a training set; the purple represents data generated by the generator; ‘epoch’ refers to the number of times the discriminator reviews existing data).
Reporting on GANs has often speculated on their potential to mislead an unsuspecting public. The introduction of “deep fakes” in 2017 has sparked a wider concern about the potential use of GANs to create “fake news”. A 2018 video by BuzzFeed used a GAN to depict President Obama calling President Trump a — quote — “total and complete dipshit”, before warning viewers to — quote — “rely on trusted news sources”.⁶
This concern derives from the oft-mentioned fact that algorithms — and particularly neural networks — tend to be “black boxes”. Both networks in a GAN have “hidden layers”, whose interrelating activities are only graspable by humans through their eventual results. When they generate realistic images, the two networks identify and reproduce patterns in visual data that we can’t ourselves. This dynamic, in which machine learning is seen to operate within a “beyond” we can’t access, is what I’m struggling towards with my title: “Automating Aesthetic Judgement”.
As I’ve stated, aesthetic judgement is a subjective claim masquerading as an objective one. Acknowledging this can be problematic for engineers competing under technocapitalism to produce the most realistic GAN-generated images.⁷ They often square this circle by touting a percentage of human test subjects “fooled” by their generated images.
Where this is too slow or expensive, however, many have employed automatic metrics to measure how “realistic” their images are. A common example is the “Fréchet Inception Distance”, which uses a pre-trained machine-learning algorithm to judge generated images against an existing data set.⁸ However, such algorithms often identify discrepancies in image-data that humans can’t, opening up the question of what “realistic” means as a goal. Are these engineers catering for human audiences, or mining data for a reality beyond human perception?

Outlining a GAN for predicting future video frames based on an existing sequence, a 2016 paper by Mathieu, Couprie and LeCun claims that a sequence it produces “does not look like anything close to the ground truth on the long term, but it remains realistic”.⁹ The assertion has echoes of Jean Cocteau’s early description of film as ‘a realistic documentary of unreal events’ in Le Sang d’un Poète.¹⁰ However, Mathiey Couprie and LeCun make this claim because their sequence receives high scores in a “Peak Signal to Noise Ratio” and “Structural Similarity Index Measure”. They thereby judge the GAN’s product realistic because it passes the criteria of other algorithms, rather than humans.
Hyperrealism
How might we non-specialists judge GAN-generated images when we encounter them outside the laboratory? It’s perhaps more accurate to describe these images as “hyper-realistic” than “realistic”. Since the GAN’s discriminator is usually fed photos, the generated images approximate the conventions of the photographic medium, rather than human visual experience itself. Their effect could therefore be most keenly felt in our everyday encounters with photos.

Photographs exemplify what Charles S. Pierce calls “indexical signs” — signs we interpret as evidence of that which they signify.¹¹ A photograph is traditionally understood as evidence that what it depicts happened, even if that happening is a performance. As Roland Barthes puts it, “Every photograph is a certificate of presence”.¹²

More recently, Stephen Prince has pointed to improvements in CGI to argue that digital images have a different, more “flexible” ontology.¹³ They’re not indexical, but they can nonetheless be judged experientially realistic because they carefully approximate the spatial information of reality in their lighting and shading.
In the moment of looking at a good-quality GAN-generated face, there is no measure we can apply to know that we are not looking at is not a photo of a real human.

If, as Barthes writes, photos strike us through their striking and self-evidential claim of capturing reality — what he calls the punctum of a photo — then GAN-generated images do so too.¹⁴ Even when we know them to be “fake”, the faces affect us in an at least similar ways to photos of real humans. It remains difficult to wholly separate the photograph from its referent: we’re haunted by someone who is visually present yet existentially absent.
However, were we to be inundated with GAN-generated faces — as cheap alternatives to models in advertising, for example — would we not would become hyper-aware of the potential non-indexicality of every photo we see? Would we not experience a general waning of the punctum?
The Eerily Realistic
It can be an unsettling experience to look a GAN-generated face in the eyes. We judge it “realistic” but might also reach for another aesthetic category: the “eerie”. Mark Fisher writes that we identify eeriness in landscapes emptied of the human, when we struggle to locate agents and find ourselves — quote — “caught in the rhythms, pulsions and patternings of non-human forces”.¹⁵ We experience such eeriness whenever Facebook adverts begin recommending a thing we mentioned once in text to a friend. The eerie highlights obscured forces at play in mundane reality.

We might then add a fourth type of realistic to our list: the “eerily realistic”. We judge GAN-generated photos as realistic — yet eerily so. Were all photos to be encountered in such way, we would become aware of the vulnerability at the centre of our aesthetic judgements.
The eerily realistic makes us aware of the ways that fictional representations — underpinned by emergent technological innovations — can bypass our “rational” defences and deposit what feels like something we already “know”. Rather than fooling us, therefore, eerily realistic images foreground their own deployment, and could inadvertently subvert the invocation of the “realistic” as one of technocapitalism’s ever-shifting end goals.
[1] Paul Guyer, ‘18th Century German Aesthetics’, Stanford Encyclopedia of Philosophy, 16 January 2007, revised 3 March 2014 <https://plato.stanford.edu/entries/aesthetics-18th-german/> [accessed 30 March 2019]
[2] Immanuel Kant, Critique of Judgement, trans. by James Creed Meredith (Oxford: Clarendon Press, 1952),p. 314.
[3] Sianne Ngai, Our Aesthetic Categories: Zany, Interesting, Cute (Cambridge, MA: Harvard University Press, 2012), p. 58.
[4] Wolfgang Klein, ‘Realismus / Realistisch’, Ästhetische Grundbegriffe, Vol. 5, ed. by Karlheinz Barck (Stuttgart/Weimar: Metzler, 2003), pp. 149–197.
[5] Walter Benjamin, ‘Little History of Photography’, in Walter Benjamin: Selected Writings, vol. 2, pt. 2,1931–1934, trans. by Rodney Livingstone et al., ed. by Michael W. Jennings, Howard Eiland, and Gary Smith (Cambridge, MA: Belknap, 1999), pp. 507–530 (pp. 510–11)
[6] BuzzFeedVideo, ‘You Won’t Believe What Obama Says In This Video 😉’, YouTube, 17 April 2018 <https://www.youtube.com/watch?v=cQ54GDm1eL0> [accessed 30 March 2019]
[7] Luis Suarez-Villa defines technocapitalism as a new era of capitalism in which the dominant mode production is the commodification of the intangible quality of human creativity through the research and development of new technologies. See: Luis Suarez-Villa, Technocapitalism: A Critical perspective on Technological Innovation and Corporatism (Philadelphia: Temple University press, 2012), pp. 3–4.
[8] Martin Heusel, et al., ‘GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium’, arXiv, 12 January 2018 <https://arxiv.org/abs/1706.08500> [accessed 30 March 2019], p. 6.
[9] Michael Mathieu, Camille Couprie and Yann LeCun, ‘Deep Multi-Scale Video Prediction Beyond Mean Square Error’, arXiv, 26 February 2016 <https://arxiv.org/pdf/1511.05440> [accessed 30 March 2019](p. 13)
[10] Le Sang d’un Poète, dir. by Jean Cocteau (Charles de Noailles, 1930)
[11] Charles S. Pierce, ‘Prolegomena to an Apology for Pragmaticism’, in Pierce on Signs: Writings on Semiotic by Charles Sanders Peirce, ed. by James Hoopes (Chapel Hill, NC: North Carolina Press, 1991), pp. 249–252 (p. 251)
[12] Roland Barthes, Camera Ludica: Reflections on Photography, trans. by Richard Howard (New York: Hill and Wang, 1981), p. 87.
[13] Stephen Prince, ‘True Lies: Perceptual Realism, Digital Images, and Film Theory’, Film Quarterly, 49.3 (Spring 1996), 27–37 (pp. 29–30)
[14] Barthes, Camera Ludica, pp. 26-27.
[15] Mark Fisher, The Weird and the Eerie, 3rd edn,(London: Repeater Books, 2016), p. 11.
