The brief was to create a piece using data from Radio 4’s programmes on the Summer Solstice day (June 21, 2018). As the second most popular domestic radio station in the UK, BBC Radio 4 broadcasts a wide variety of programmes including news, drama, comedy, science and history. These radio programmes are interesting as they give us more freedom to imagine the context at hand. We interpret and visualise the content according to our own knowledge and past experiences. Thus the way we “see” the content probably varies significantly.
While Radio 4 is notable for its news programmes such as Today and The World at One, we found the Four Seasons on the solstice day particularly intriguing, as it presented a collection of poems that featured throughout the day celebrating the Summer Solstice.
Over my head, I see the bronze butterfly,
Asleep on the black trunk,
Blowing like a leaf in green shadow.
Down the ravine behind the empty house,
The cowbells follow one another
Into the distances of the afternoon.
— James Wright, Lying in a Hammock at William Duffy’s Farm
These poems are so rich in imageries, using figurative language to spark off senses and emotions in relation to summers. While listening to these poems created an imaginative experience for us, it made us wonder how a machine would perceive and visualise them, which led us to challenges regarding memory, vision, and imagination.
Memory and Imagination
We as humans acquire our memories in a complex way, mixing visual, sensorial, and emotional inputs. These memories often play an important role in the process of imagination. As Hume considers imagination as a bundling of associations, memories provide materials of which we can make these associations.
The memory of a machine could be formed in very different ways in terms of datasets and training models. For example, cycle-consistent adversarial network (CycleGAN) enables machines to see the world in certain visual style and machines trained with recurrent neural network (RNN) are able to generate text. However, despite different machine learning models, trained by humans using curated datasets, machines are embedded with human biases and values, see the world as humans do. An interesting question is how the machine’s interpretation would be given its human-fed memory. Would learning machines, imbued with power and history, develop their own computational imagination based on these interpretations?
Language and Vision
Language is the interface for human communication. As Ed Finn (2018) puts it, “language itself, particularly written language, serves as the original ‘outside’ to human thought, the first machine for processing culture across time and space”.
Reading and listening to the poems in Four Seasons conveys personal and obscure aesthetic experiences. From the poets to the audience, from then to now, these experiences are shared with language and vision that are interwoven throughout. If a machine could perceive and imagine the subjective aesthetic experience, how could multiple machines share their perception and imagination?
A Computational Conversation
These questions led us to the experimentation of creating a conversation between two computational entities: one reads a poem from the Four Seasons and searching for images to interpret the poem, the other, looks at these images, describes what it sees and generates its own poem accordingly.
We opted for Runway as the main tool as it includes prepackaged machine learning models and can be connected with diverse open-source plugins. The translation of images into texts is an essential part of this loop. We used the im2txt model, which is a neural network created to describe the content of images in a semantic manner. The model is pre-trained using the common objects in context (COCO) dataset.
To set up the conversation, we used a monitor connected with a mac mini as the reader, to read out the poem and display associated images, a laptop with a webcam as the listener, to watch the images and translate them into text. The reader plays the audio of the poem and searches for pictures to display. And the listener translates the livestream into text in Runway and passes the result to Processing through the open sound control (OSC) protocol. Based on the initial descriptive interpretation, the listener then generates its own poem using the long short term memory networks (LSTMs) and reads them out.
This project presents an exploration of computational imagination. We approached the challenge through communicating interpretations of poetry. Although the training data is highly human-curated (COCO), the results tend to be unpredictable, which are sometimes amusing. However, inevitably it exposes the issue of dataset bias. For example, the machine often recognises nature entities as manmade objects, particularly electronic artifacts such as phone and Nintendo Wii controller. This, as a reflection of culture, knowledge, and expertise, made us ask the question of how machines are trained, and by whom. In the future, given sufficient time, it would be interesting to create our own dataset while keeping the tension between control and autonomy.
Finn, E. (2018).What Algorithms Want. MIT Press.