The Transfiguration of Bandit Tajōmaru
The bandit Tajōmaru is a central character in Akira Kurosawa’s film Rashomon (1950), which studies human nature through a series of flashbacks and unreliable accounts of a deadly encounter in the forest. Tajōmaru’s testimony contributes to Rashomon’s overarching frame narrative as an unstable system of stories within stories and supplies the first version of the samurai’s death. However, Tajōmaru sensationalises his account of events to promote a certain kind of image of himself. As the audience sees the events through flashbacks, there is an implicit assumption that what is shown is an accurate representation of what objectively happened, however, each flashback merely reflects a point of view. This perspective is both true and false simultaneously, as each time the character narrates his version of events, it is juxtaposed with imagery that conveys his distorted subjective perception of reality. Through the ‘Rashomon effect’ there is a constant displacement and reframing of the way the audience imagines the relations between things — determining what is real involves a confusion of realities revolving around self-interests.
All of the images included in this text have been generated in Midjourney using Tajōmaru’s testimony as the basis for the text prompt. Even though the text prompt remained the same, it produced different images on each iteration, which I understand as having correspondence with the Rashomon effect, and Rashomon’s unstable narrative system.
Midjourney is text2image software that prioritises language in the making of things through the use of the text prompt. However, sometimes the relationship between the initial prompt and the final image is difficult if not impossible to fathom. Some clarity might be found in at least citing the problems of language translation. As an example of interlingual translation, where verbal signs are translated into another language, Roman Jakobson gives an example from Northeast Siberian Chukchees, one of the indigenous languages of Siberia, where the noun ‘watch’ translates into ‘hammering heart’. Through this translation, it can be imagined how someone unfamiliar with wearing a wristwatch might perceive a ‘flow of time’ being pumped through the device. The perception is enabled by a conceptual blending between the mechanical movements of a watch and the felt contractions of heart cells. In general, according to Jakobson in his 1959 text On linguistic aspects of translation, where there is a deficiency between two different languages, the translation of meaning occurs across loanwords, neologisms and semantic shifts. This can invite, as Judith Hoad describes it in her chapter The machinery of language, a processed-based way of appreciating language, steeped in imaginative metaphor with an emphasis on inter-relationships. When considering text2image software like Midjourney that converts textual prompts into images, and with process-based approaches in mind, it makes more sense to relate to Jakobson’s concept of intersemiotic translation, where the language of one modality transforms into another, such as from a novel into a film— “[i]ntersemiotic translation or transmutation is an interpretation of verbal signs by means of signs of nonverbal sign systems.”
The American philosopher C. P. Peirce is associated with Jakobson through his earlier contributions to the scientific study of language. Peirce defined an index as a category of sign that maintains a physical tie to the thing being referred to, “[p]hotographs, especially instantaneous photographs, are very instructive because we know that they are in certain respects exactly like the objects that they represent.” Peirce emphasised in his seminal work Logic as Semiotic: The Theory of Signs in 1902 that “the index is physically connected with its object; they make an organic pair”, giving the example of smoke as an indication of fire. The influential art critic and historian Rosalind Krauss was largely responsible for migrating Peirce’s concept of the ‘index’ from semiotics into art discourse. Writing about the index in art in 1977, she suggested that a cast shadow could serve as the indexical sign of an object, by “establishing meaning along the axis of a physical relationship to its referent.” She continues with the theme of light through the example of an analogue photographic print considered as an index because it verifies something as “having-been-there”, an echo of an object’s former presence. According to Krauss, “[t]ruth is understood as a matter of evidence, rather than a function of logic.”
In Jakobson’s understanding, “[e]quivalence in difference is the cardinal problem of language”. To better comprehend the operations of multi-modal generative AI, we need to find ways of conceptualising how the intrinsic difference between the text prompt and the resultant image finds its equivalence through the latent space of the model. “Language is considered the metaphorical machinery that delivers cultural messages” according to Judith Hoad, who describes how poets, in an attempt to convey meaning by sharpening the tool of expression, pare down the machinery of language to “arrive at the energy source that they then spark up […] in new and sometimes startling ways.” Creatives working with generative AI tools, in general, seem to accept that the “metaphorical machinery” alluded to by Hoad has become largely occluded. Sparking up has become a form of arcane prompt-engineering, as part of a new chance operation, offering a tenuous and diminishing influence over outcomes.
With a generated AI image there is a form of proximity to the real world to think about via the training dataset—in both its distributed form before aggregation and its abstracted, networked form in the latent space of the model. In the context of photography, Krauss’s notion of the index as a trace of former presence relates to the way collected information, once ingested and processed by AI software, is then used to synthesise representations of former things. Can the relationship between a generated image and an AI model be characterised as indexical because of a causal relationship between them? In a conversation between Isabelle Graw and Benjamin Buchloh in 2015 concerning indexicality in analogue and digital photography, Graw mentioned there is an argument that there can no longer be any indices in the world because the index is predicated on analogue recording devices. At the same time, others call for the data traces of digital imaging to be characterised as indexical. In the context of the AI-generated image, can the data trace be understood to include the training dataset that led to its constitution?
Meanwhile, the way that AI is now being developed is changing—it is becoming more profit-driven and less open as a scientific discipline. As AI becomes more powerful, technology companies are increasingly secretive about how they train their systems. Transparency researchers score ‘openness’ on several different criteria, including how data is collected and annotated, whether it includes copyrighted material, the kind of hardware used to train and run the model, and its energy consumption. Recent research from Stanford University shows that no model has scored more than 54% on the transparency scale, leading to the conclusion that as AI becomes more influential it becomes increasingly inscrutable.
The fact that the cause of the death at the centre of Rashomon is unresolved gave rise to the idea of presenting the forest image that has been used as part of the prompting process¹ and foregrounding it as a repository for evidence. To this end, a seemingly empty forest scene is visualised here as the final event. Tajōmaru’s inconclusive and contradictory testimony has been encrypted² and buried within the image data of the scene using steganography,³ a technique for concealing messages inside other media. This represents a version of events hiding in plain sight, reminiscent of the vast quantities of unacknowledged images, annotations, and human labour that underpin generative AI technology. The obscure relationship between language, data collected from the tangible world, and its remixed, synthetic depiction becomes even more skewed as Tajōmaru’s contradictory testimony is transformed into the scene of a crime.
Notes
¹ As a component of this project, I’ve created a collection of artificial Japanese forest scenes based on detailed prompts, which included specifications about the type of film stock, perspective, and various other parameters. The use of images within the prompt in Midjourney can have an impact on the resulting output, affecting aspects such as environmental context, composition, lighting, and colour.
² A gif of Tajōmaru’s encrypted testimony:
³ A seminal event in the timeline of photography’s technical development is the 15th century alchemists’ discovery of how to merge silver and marine salts in order to transform off-white to black when exposed to light. This represented the foundation of a process Fox Talbot would refine 300 years later as a photographic print. He published his results as The Pencil of Nature in 1844. The alchemists’ breakthrough coincides chronologically with the earliest known instance of steganography, which occurred in 1499 through the efforts of Johannes Trithemius, a Benedictine abbot from Germany. Trithemius pioneered a method for covertly transmitting information by camouflaging a treatise (about various concealment techniques) within a book on magic. As a clandestine means of communication, modern steganographic methods could be used by a black hat hacker to obscure unlawful data, cloak malicious code, or transmit directives to command-and-control servers.
References
‘The machinery of language’, by Judith Hoad (2001) in What machines? Published by Mermaid Turbulence.
‘On linguistic aspects of translation’, by Roman Jakobson (1959) in On translation. Published by Harvard University Press.
‘Logic as Semiotic: The Theory of Signs’, by C.S. Peirce (1955) in Philosophical Writings of Peirce. Published by Dover Publications.
‘Notes on the Index: Seventies Art in America. Part 2’, by Rosalind Krauss (1977) in October. The MIT Press, Vol. 4.
‘Lost traces of life: a conversation about indexicality in analog and digital photography between Isabelle Graw and Benjamin Buchloh’, (2015) in Texte Zur Kunst, 99. Available at: https://www.textezurkunst.de/en/99/verlorene-lebensspuren/