Masako

Bruce Gilchrist
8 min readOct 2, 2023

--

Image generated in Midjourney using the testimony of Masako in Akira Kurosawa’s “Rashomon” as a source for the prompt text.

The emergence of AI-art from a Google laboratory¹ can be considered as a gothic story, with its eerie, exotic atmosphere and depictions of horrible events. A metaphorical expression of psychological or social conflict. Like plot devices in gothic literature the productions of AI-art are composed from stolen manuscripts or interpolated narratives of uncertain authorship. Similar to Akira Kurosawa’s storytelling method of an unstable system of contradictory flashbacks and testimony in his film Rashomon (1950), the origin story of AI-art incorporates tales within tales and shape-shifting engineer-artist narrators — AI-whisperers animating the gloom. At the core of much of the hype has been the suggestion of a networked autonomous intelligent machinery solving the intractable mystery of consciousness in an act of singularity. This scenario has been relentlessly mined by AI-artists to conjure dark spooky settings freighted with mystery, fear and dread, and like all Gothic stories, lies beyond the understanding of science.

Rashomon plays out in a forest, a courthouse, and a crumbling, derelict city gate, the latter being a potent symbol of a post-WW2 Japan ruined by weapons of mass destruction. According to Maya Barzilai, it’s in post-war symbolism with the metaphorical promise of rebuilding, immigration, and assimilation, that reinventions of the Golem myth are to be found. Originally a Jewish story about an anthropoid figure of clay that gains a mind of its own and turns against its human masters, the Golem is often cited as one of the earliest AI prototypes. This narrative serves those who develop AI by contributing to the historical determinism inherent in the AI creation myth, that we inevitably make machines that will prove to be smarter than us and will eventually subjugate humanity. Functioning as a kind of smoke screen, this idea ascribes essential agency to algorithms, allowing the AI builders to abdicate responsibility.

Image generated in Midjourney using the testimony of Masako in Akira Kurosawa’s “Rashomon” as a source for the prompt text.

Much of Gothic literature’s fascination comes from the suggestion of supernatural or inexplicable events, such as inanimate objects coming to life. The core of the Golem myth appeared throughout the early twentieth century in descriptions of unsouled human creations as alchemical combinations of soil and language. In these early stories the clay anthropoid is magically brought to life through writing, which is often engraved directly onto its body, an event that foreshadows the roughly rendered ersatz human running amok and destroying its surroundings. Ultimately the Golem allows for speculation about the relationship between humans and machines, and has been used by Barzilai in her book, Golem: Modern wars and their monsters to examine Norbert Wiener’s Cybernetics and the science fiction of Stanislaw Lem among other things. She also develops the idea of “ethical golems” that point to the ethical weaknesses of their creators. Perhaps we can consider some kind of correspondence between the sacred code that animates the Golem and generative AI that mindlessly produces images when prompted by linguistic cues.

As a highly mutable metaphor and parable of humanity’s shortcomings, the Golem can also be reflected on as a crude robot at the service of Technic as described by Federico Campagna in his book, “Technic and Magic”. Both Technic and Magic are described as reality-systems, specific to historical eras, that determine what is possible to do and think. Campagna describes the contemporary age fully immersed in the Technic system of reality, which is catastrophic and continues to capture the world through “absolute language”, outside of which nothing is possible. Here, existence is categorised to the point where “nothing remained of the human and the forest and the waterfall but the linguistic sign of their value as standing reserves […] the whole of the existent and the possible reduced to a closed sphere of language.” If we understand Technic’s “closed sphere of language” as the large language model (LLM) component of multimodal generative AI, it provides a frame for the ‘worldview’ of a dataset — a manufactory for synthetic products. Production is achieved through a statistical ‘understanding’ of distance between the attributes of things within the latent space of the model. Attributes, also conceptualised as ‘dimensions’, are where connections are made as an ‘ontology of positions’. Within LLMs there is no linguistic meaning as we understand it associated with these elements. Instead there is abstract space full of non-human-readable things that have been automatically assigned semantic ‘neighbourliness’ based on the proximity of words within the corpora of the training data.

Concern has shifted from worrying about AI’s ‘otherness’ to worrying about our tendency to be fooled by mimicry and attributing human-like qualities to computer behaviours. To counter the tendency to anthropomorphise these machine operations, computational linguist Emily Bender points out how different this is to the way language is processed by human minds, which rely on referents as actual things and ideas in the world in order to generate meaning. She argues that computer models are only trained on the form of language without any attempt to connect with the meaning.

Image generated in Midjourney using the testimony of Masako in Akira Kurosawa’s “Rashomon” as a source for the prompt text.

Midjourney text2image software, which is made from a coalition of different algorithmic models, is based on neural network architecture that maps text descriptions to image features. When prompted, the model invokes images from latent space that in some way resonate with the essence of the input text description. As a creative exercise to explore the interplay between textual cues and the model, I have used the conflicting testimonies of Masako and her samurai husband from the Rashomon narrative as the basis for text prompts. Using a random seed value, Midjourney produced different images on each iteration using exactly the same text prompt. I’ve adopted these images as counterparts to Rashomon’s conflicting stories narrating the same event — the death of the samurai in the forest.

Kurosawa employed the cinematic convention of the flashback as a storytelling device. However, it was the first use of flashbacks that disagreed about the action they were flashing back to, supplying first-person eyewitness accounts that were dramatically conflicted. The audience becomes absorbed by what it trusts is an unfolding story, but since the story is told within a shadow play of truth and memory — denoted through Kurosawa’s jarring lighting aesthetic — the audience is ultimately left to form its own interpretation of the unresolved events. Each character in the story, acting on some personal motive, has a conflicting version of the incident and there is a veiled suggestion that Masako might have murdered her samurai husband. Her description of events is contradicted by the other characters, the bandit Tajōmaru and the woodcutter. Another figure in the story, the Shinto priest, is in effect an outlier in that he largely forms opinions by listening to the testimony of others without contradicting them, but in the process loses his faith in humanity. Even the deceased samurai has his testimony heard in court — a horror story articulated from beyond the grave through the mediumship of a Shinto psychic — where he claims to have committed ritual suicide using Masako’s blade.

Image generated in Midjourney by the author using the testimony of the samurai in Akira Kurosawa’s “Rashomon” as a source for the prompt text.

In the making of AI technology there are hubristic assumptions made that through ‘Big Data’, in Campagna’s words, “the language of information technology is capable of grasping the whole of existence”, and that the whole of existence falls within “the reach of the language of information technology”. There is a romantic idea of AI somehow containing precursors for human consciousness from which sentience will spontaneously emerge by virtue of the vast scale of language being trawled and ingested. In reality, while there are over 7,000 living languages in the world, the training corpora of LLMs represent a narrow aspect of the internet that reflects white, male, English speaking authors more than anyone else. Over time we can envision these language models starting to homogenise, with their next generations being trained on synthetic data leading to something now being referred to as ‘model collapse’. At this point the software loses an already tenuous traction with how authentic human language functions, and therefore the ability to mimic it. This can be imagined leading to an increasingly incoherent production disconnected from the real world, as the technology progressively ingests itself.

¹ AI-art was born in a Google laboratory as an experiment by Mordvintsev et al. concerned with reverse engineering an algorithm trained to ‘recognise’ or classify specific images. The visual outputs of these experiments were brought to the attention of the public via an exhibition at Gray Area in San Francisco, DeepDream: the art of neural networks (2016) curated by Joshua To, a User Experience (UX) designer. Like most of the eleven contributors to the exhibition, the curator was affiliated with or in in the employ of Google. In an opening address and accompanying online essay to the exhibition, Blaise Agüera y Arcas, a Google machine-intelligence developer, likened the artistic use of neuronal networks to photography or the employment of optical instruments by Renaissance artists. “Like the invention of applied pigments, the printing press, photography, and computers, we believe machine intelligence is an innovation that will profoundly affect art”. The historical scale and cultural vision of Arcas was at odds with the pedestrian-sounding mission statement of the curator, for whom the primary objective of the exhibition was to mitigate a problem of information design: how to communicate the complex ideas behind neural networks to the public in a comprehensible format. The different algorithmic techniques employed by each contributor — DeepDream; Fractal DeepDream; Class Visualisation; Style Transfer — were explained in highly reductive terms for non-specialists and were represented as icons used as part of wall signage connecting contributors with dates and techniques. According to Gray Area, the exhibition and accompanying Google symposium were significant in attempting to disseminate and inculcate a solid understanding of neural networks and DeepDream with the public. As a science communication and corporate public relations exercise, the emphasis of this inaugural exhibition was on the workings of the technology itself; the operations of the technology were the apex focus of the exhibition, and nothing to do with art. This perspective was aptly summed up by media theorist Dieter Mersch when he said, concerning the instrumentalism of DeepDream in general, “The designs they let loose refer to nothing more than the parameters of their own technical conditions; they contain neither social engagement nor historical impact, nor do they intervene in their environment: they are what molds the machines.”

References

‘Golem: Modern Wars and Their Monsters’, by Maya Barzilai (2020). Published by NYU Press.

‘Technic and Magic: The Reconstruction of Reality’, by Federico Campagna (2018). Published by Bloomsbury Academic.

--

--