Artifacts of Utterance: Interpretation and Translation in the age of AI
Environments Studio IV | Dan Lockton | Spring 2018
Ongoing collaboration with Scott Leinweiber
Project Brief: Regarding the question of human intelligences and artificial intelligences (of various kinds) together, in environments — what dimensions could there be to these interactions, and what issues do they highlight, now and in the future? What is the role of designers in these situations?
Preface
Facilitated by a whirlwind of emerging tech and consequentially shifting social paradigms, the convergence of AI in creative tools has revolutionized the way creators are synthesizing ideas through digital media.
- Early explorations of AI positioned as autonomous collaborators in the creative process
- Emergent businesses and trades: AI systems help monitor, quantify and visualize complex patterns of user behavior ( e.g. autonomous vehicles, cyber security).
- Open source end-user tools for 2D and 3D synthesis of sounds, visuals and form
As our modes and vehicles of personal expression in this digital age become increasingly complex, how do we attribute meaning to things in our lives, consciously and innately? What are the ways in which we let algorithms attribute meaning for us? How will recognizing patterns in our meaning-making help us better understand ourselves/others?
Voice Intelligence
Our voice is one of our first creative and collaborative tools. By posing this innate device as a touchpoint for human-computer interaction, people of many age demographics and bodily agency are able to access complex information systems and technologies. In these contexts, artificial intelligence plays a critical role in the way meaning is derived from human speech.
Proposal
This project poises an AI system as a tool that uses properties of natural speech (acoustic qualities + socio-cultural and personal context + emotional intent) as a generative framework for interpretation and translation.
I’m interested in investigating the affordances of AI in the context of real-time interpretation and translation, as well as generating new models of “meaning-finding” in linguistic to visual translation. I am also curious as to how truly “intelligent” these systems can become regarding the phenomenon of context-switching across conversations and in different cultures. With their ability to process complex linguistic structures in tandem with the socio-cultural digital footprint of it’s user, AI-assisted voice-interfaces can revolutionize not only the way we speak to computers but also how we speak to each other. By representing two speakers’ unique utterances in the form of personalized visual artifacts, an AI can construct a visual scenario that bridges an understanding between distinct oral cultures.
Breaking Down Natural Speech
Talk about modern/widespread frameworks for speech analysis:
- What you mean (semantic analysis → acoustic, prosodic, linguistic)
2. Who you are (digital footprint → social and visual media, digital habits)
3. Where you are (regional, geographic)
What You Mean
Prosody is the study of the tune and rhythm of speech and how these features contribute to meaning. The study usually applies to a level above that of the individual phoneme and very often to sequences of words (in prosodic phrases).
At the phonetic level, prosody is characterized by:
- vocal pitch (fundamental frequency) (other: intonation)
- loudness (acoustic intensity) (other: gain, total power, amplitude)
- rhythm (phoneme and syllable duration)
Speech contains various levels of information that can be described as:-
- Linguistic — direct expression of meaning
- Paralinguistic — may indicate attitude or membership of a speech community
- Non-linguistic — may indicate something about a speaker’s vocal physiology, state of health or emotional state
Michael Halliday describes 5 simple and 2 compound primary tones for English. They are:-
- Tone 1 — falling
- Tone 2 — high rising
- Tone 3 — low rising
- Tone 4 — falling-rising
- Tone 5 — rising-falling
- Tone 13 — falling plus low rising
- Tone 53 — rising-falling plus low rising”
Phonetic Profiling
It is also possible to extract meaning from individual phonemes:
“In linguistics, sound symbolism, phonesthesia or phonosemantics is the idea that vocal sounds or phonemes carry meaning in and of themselves.”
Vowel Transcription Systems
Reflects a regression model capable of recognizing input features and mapping them to specific outputs- in this case, phonemes to a cartographic 2D space.
Emotive Modeling
The Circumplex model of Affect
“Factor-analytic evidence has led most psychologists to describe affect as a set of dimensions, such as displeasure, distress, depression, excitement, and so on, with each dimension varying independently of the others. However, there is other evidence that rather than being independent, these affective dimensions are interrelated in a highly systematic fashion. The evidence suggests that these interrelationships can be represented by a spatial model in which affective concepts fall in a circle…” — Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.
Used by many developers/designers of voice interaction and conversational interfaces, site MIT’s EmotiveModeler CAD Tool
Where You Are
Google Advanced Image Search
Design for Meaning-Finding & Visualizing
Technical Framework
MFCC Extractor in MAX/MSP
Patcher by cososc: https://github.com/cososc/mfcc_extractor/blob/master/mfcc_extractor.maxpat
For every 10 ms, with a frame size of 25 ms:
- Do Fast Fourier Transform — FFT (to convert to the frequency domain)
- Apply Mel scaling (Take the log of the frequencies to approximate human perception of frequencies)
- Do Discrete Cosine Transform (to get a single real value for each frequency bin)
- Create a feature vector, which consists of:
- 12 MFCC features (how much of each frequency bin right now); MFCCs are representations of the short-term power spectrum of a sound
- The “total energy” (how loud is the sound right now)
Bark Extractor in MAX/MSP
Form Manipulation with Voice
Technical Pipeline
Thank you :)
Research + References
On Creativity + Intelligences
Work by Harold Cohen
“Driving the Creative Machine”, Harold Cohen, Orcas Center, Crossroads Lecture Series September 2010
“One thing we know about creativity is that it typically occurs when people who have mastered two or more quite different fields use the framework in one to think afresh about the other. Intuitively, you know this is true. Leonardo da Vinci was a great artist, scientist and inventor, and each specialty nourished the other. He was a great lateral thinker. But if you spend your whole life in one silo, you will never have either the knowledge or mental agility to do the synthesis, connect the dots, which is usually where the next great breakthrough is found.” — Marc Tucker, the president of the National Center on Education and the Economy:
http://www.aaronshome.com/aaron/aaron/publications/orcastalk2s.pdf
“TOWARDS A DIAPER-FREE AUTONOMY”, Harold Cohen, Museum of Contemporary Art, San Diego, August 4th 2007
“The Art of Self-Assembly: the Self-Assembly of Art”, Harold Cohen, Dagstuhl Seminar on Computational Creativity, July 2009
Work by Chris Schmandt
Misc Projects
On Ethics + AI
Work by Madeline Elish:
On Natural Language Processing
Unrelated But Related
On the importance of space and sound
More on Cybernetics
“Designing Freedom”, Stafford Beer
“Actor-Network Theory”, Bruno Latour
Cognitive Architectures: Models of Perceiving and Interpreting the World