Artifacts of Utterance: Interpretation and Translation in the age of AI

11 min readApr 3, 2018

--

Environments Studio IV | Dan Lockton | Spring 2018

Ongoing collaboration with Scott Leinweiber

Project Brief: Regarding the question of human intelligences and artificial intelligences (of various kinds) together, in environments — what dimensions could there be to these interactions, and what issues do they highlight, now and in the future? What is the role of designers in these situations?

Preface

Facilitated by a whirlwind of emerging tech and consequentially shifting social paradigms, the convergence of AI in creative tools has revolutionized the way creators are synthesizing ideas through digital media.

Early explorations of AI positioned as autonomous collaborators in the creative process
Emergent businesses and trades: AI systems help monitor, quantify and visualize complex patterns of user behavior ( e.g. autonomous vehicles, cyber security).
Open source end-user tools for 2D and 3D synthesis of sounds, visuals and form

Harold Cohen’s AARON AI Collaborator, Christ Schmandt’s “Put That There” (1979); MIT Media Lab Speech Interface group

Deck.gl by Uber http://uber.github.io/deck.gl/#/, Tesla Computer Vision

Google Magenta’s NSynth for music synthesis, Lobe Visual Interface for deep learning, “Project Dreamcatcher” by Autodesk for form interpolation

As our modes and vehicles of personal expression in this digital age become increasingly complex, how do we attribute meaning to things in our lives, consciously and innately? What are the ways in which we let algorithms attribute meaning for us? How will recognizing patterns in our meaning-making help us better understand ourselves/others?

Voice Intelligence

Our voice is one of our first creative and collaborative tools. By posing this innate device as a touchpoint for human-computer interaction, people of many age demographics and bodily agency are able to access complex information systems and technologies. In these contexts, artificial intelligence plays a critical role in the way meaning is derived from human speech.

Proposal

This project poises an AI system as a tool that uses properties of natural speech (acoustic qualities + socio-cultural and personal context + emotional intent) as a generative framework for interpretation and translation.

I’m interested in investigating the affordances of AI in the context of real-time interpretation and translation, as well as generating new models of “meaning-finding” in linguistic to visual translation. I am also curious as to how truly “intelligent” these systems can become regarding the phenomenon of context-switching across conversations and in different cultures. With their ability to process complex linguistic structures in tandem with the socio-cultural digital footprint of it’s user, AI-assisted voice-interfaces can revolutionize not only the way we speak to computers but also how we speak to each other. By representing two speakers’ unique utterances in the form of personalized visual artifacts, an AI can construct a visual scenario that bridges an understanding between distinct oral cultures.

Breaking Down Natural Speech

Talk about modern/widespread frameworks for speech analysis:

What you mean (semantic analysis → acoustic, prosodic, linguistic)

2. Who you are (digital footprint → social and visual media, digital habits)

3. Where you are (regional, geographic)

What You Mean

Prosody is the study of the tune and rhythm of speech and how these features contribute to meaning. The study usually applies to a level above that of the individual phoneme and very often to sequences of words (in prosodic phrases).

At the phonetic level, prosody is characterized by:

vocal pitch (fundamental frequency) (other: intonation)
loudness (acoustic intensity) (other: gain, total power, amplitude)
rhythm (phoneme and syllable duration)

Speech contains various levels of information that can be described as:-

Linguistic — direct expression of meaning
Paralinguistic — may indicate attitude or membership of a speech community
Non-linguistic — may indicate something about a speaker’s vocal physiology, state of health or emotional state

“Good for YOU” vs. “Good FOR you” vs. “GOOD for you”

Michael Halliday describes 5 simple and 2 compound primary tones for English. They are:-

Tone 1 — falling
Tone 2 — high rising
Tone 3 — low rising
Tone 4 — falling-rising
Tone 5 — rising-falling
Tone 13 — falling plus low rising
Tone 53 — rising-falling plus low rising”

Phonetic Profiling

“Fig. 6. Schematic illustrations of the phonetic profiles of positive and negative intensification that emerged from the key words with (a) short vowels (V S ) and (b) long vowels (V L ) in the accented target syllables. The shapes of the polygons in the lower panels represent the acoustic energy (E) courses. The upper panels sketch the characteristic F0 peak contours. Broken lines point to the possibility of voiceless-onset consonants. The different shades of the segment polygons refer to the differences in voice quality (i.e. lighter = breathier). All illustrations are based on the means of table 1. F0 ranges are oriented towards actually found values.” — Oliver Niebuhr, **On the Phonetics of Intensifying Emphasis in German** https://www.researchgate.net/figure/Schematic-illustrations-of-the-phonetic-profiles-of-positive-and-negative-intensification_fig4_47357730

It is also possible to extract meaning from individual phonemes:

Sound symbolism — Wikipedia

Words that share a sound sometimes have something in common. If we take, for example, words that have no prefix or…

en.wikipedia.org

“In linguistics, sound symbolism, phonesthesia or phonosemantics is the idea that vocal sounds or phonemes carry meaning in and of themselves.”

Bouba/kiki effect — Wikipedia

The bouba/kiki effect is a non-arbitrary mapping between speech sounds and the visual shape of objects. This effect was…

en.wikipedia.org

Vowel Transcription Systems

Reflects a regression model capable of recognizing input features and mapping them to specific outputs- in this case, phonemes to a cartographic 2D space.

Figure 6. Australian English diphthong schematic trajectories superimposed onto the traditional vowel map with IPA cardinal vowels indicated (International Phonetic Association, 1999). https://www.researchgate.net/figure/Australian-English-diphthong-schematic-trajectories-superimposed-onto-the-traditional_fig6_46271828

Emotive Modeling

The Circumplex model of Affect

“Factor-analytic evidence has led most psychologists to describe affect as a set of dimensions, such as displeasure, distress, depression, excitement, and so on, with each dimension varying independently of the others. However, there is other evidence that rather than being independent, these affective dimensions are interrelated in a highly systematic fashion. The evidence suggests that these interrelationships can be represented by a spatial model in which affective concepts fall in a circle…” — Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.

Used by many developers/designers of voice interaction and conversational interfaces, site MIT’s EmotiveModeler CAD Tool

Where You Are

Google Advanced Image Search

Design for Meaning-Finding & Visualizing

Technical Framework

MFCC Extractor in MAX/MSP

Patcher by cososc: https://github.com/cososc/mfcc_extractor/blob/master/mfcc_extractor.maxpat

For every 10 ms, with a frame size of 25 ms:

Do Fast Fourier Transform — FFT (to convert to the frequency domain)
Apply Mel scaling (Take the log of the frequencies to approximate human perception of frequencies)
Do Discrete Cosine Transform (to get a single real value for each frequency bin)
Create a feature vector, which consists of:
12 MFCC features (how much of each frequency bin right now); MFCCs are representations of the short-term power spectrum of a sound
The “total energy” (how loud is the sound right now)

Bark Extractor in MAX/MSP

Form Manipulation with Voice

Technical Pipeline

Bark Extractor → OSC → Wekinator → OSC → Unity

UnityOSC connection (github), Noise Shader (Char Stiles), (Animator + Scaling, Scott Leinweiber)

Thank you :)

Research + References

On Creativity + Intelligences

Work by Harold Cohen

Harold Cohen Online Videos and Audio links

"Collaborations with My Other Self" Gallery at Calit2, University of California San Diego, 2011

www.aaronshome.com

“Driving the Creative Machine”, Harold Cohen, Orcas Center, Crossroads Lecture Series September 2010

“One thing we know about creativity is that it typically occurs when people who have mastered two or more quite different fields use the framework in one to think afresh about the other. Intuitively, you know this is true. Leonardo da Vinci was a great artist, scientist and inventor, and each specialty nourished the other. He was a great lateral thinker. But if you spend your whole life in one silo, you will never have either the knowledge or mental agility to do the synthesis, connect the dots, which is usually where the next great breakthrough is found.” — Marc Tucker, the president of the National Center on Education and the Economy:

http://www.aaronshome.com/aaron/aaron/publications/orcastalk2s.pdf

“TOWARDS A DIAPER-FREE AUTONOMY”, Harold Cohen, Museum of Contemporary Art, San Diego, August 4th 2007

“The Art of Self-Assembly: the Self-Assembly of Art”, Harold Cohen, Dagstuhl Seminar on Computational Creativity, July 2009

Work by Kyle Steinfeld

Kyle Steinfeld

A document made by Kyle Steinfeld that is suitable for the World Wide Web and web browsers. Upon loading this document…

ksteinfe.com

Index

Here we find some experiments in drawing with machines!

drawing.ksteinfe.com

Work by Rebecca Fiebrink

Rebecca Fiebrink, Machine Learning for Creators

Machine Learning for Musicians and Artists, Rebecca Fiebrink | Kadenze

Log-in to your account on kadenze.com. Email us at communications@kadenze.com or call us at (661) 367-1361.

www.kadenze.com

Work by Chris Schmandt

Early project by Chris Schmandt (1979); MIT Media Lab Speech Interface group video collection

Misc Projects

WordsEye

WordsEye lets you type a picture! Create 3D scenes simply by describing them and share your creations with friends. A…

www.wordseye.com

Google AI Experiments

AI Experiments is a showcase for simple experiments that make it easier for anyone to start exploring machine learning…

experiments.withgoogle.com

Old, Weird Tech: John Muir Mechanical GTD Desk Edition

John Muir is known as a environmental conservationist. He founded the Sierra Club and penned many an article -- like…

www.theatlantic.com

Rethinking Design Tools in the Age of Machine Learning

The creative reach of the individual is expanding.

medium.com

Project Dreamcatcher | Autodesk Research

What if a CAD system could generate thousands of design options that all meet your specified goals? It's no longer what…

autodeskresearch.com

https://www.youtube.com/watch?v=A-fxij3zM7g Braitenberg Vehicles by Brian Douglas

On Ethics + AI

Manufacturing an Artificial Intelligence Revolution by Yarden Katz :: SSRN

While the term "Artificial Intelligence" (AI) was coined in the 1950s, in recent years AI has become a focus of…

poseidon01.ssrn.com

Human-Centered Machine Learning

7 steps to stay focused on the user when designing with ML

medium.com

“Another thing that’s about to come on the way, is basically a really high demand for people who understand this research and could give it good voice in terms of like representing it visually.”

Work by Madeline Elish:

Don’t Call AI Magic

points.datasociety.net

Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction (We Robot 2016) by M. C. Elish …

A prevailing rhetoric in human-robot interaction is that automated systems will help humans do their jobs better…

poseidon01.ssrn.com

Toward ethical, transparent and fair AI/ML: a critical reading list

In the past 5 years there’s been a lot of enthusiasm about AI and specifically machine learning and deep learning. As…

medium.com

Opinion | Artificial Intelligence's White Guy Problem

ACCORDING to some prominent voices in the tech world, artificial intelligence presents a looming existential threat to…

www.nytimes.com

Is AI Sexist?

Heather Roff, an artificial intelligence and global security researcher at Arizona State University, cannot shake her…

foreignpolicy.com

Cybernetics Book Club Readings, NYC

https://github.com/learning-gardens/cybernetics-club/blob/master/reading/session-1/NMR_65-72.pdf

https://github.com/learning-gardens/cybernetics-club/blob/master/reading/session-2/fuckoffgoogleeng.pdf

On Natural Language Processing

Introduction to Prosody

Pragmatics examines the distinction between the literal meaning of a sentence and the meaning intended by the speaker…

clas.mq.edu.au

Natural Language Processing - Introduction - NLP-FOR-HACKERS

This is probably the first post I should have written on the blog. The thing is, I did machine learning and natural…

nlpforhackers.io

Unrelated But Related

On the importance of space and sound

how to do nothing

This is a version of a keynote talk I gave at EYEO 2017 in Minneapolis.

medium.com

More on Cybernetics

“Designing Freedom”, Stafford Beer

https://github.com/learning-gardens/cybernetics-club/blob/master/reading/session-4/designing_freedom-stafford_beer-1973.pdf

“Actor-Network Theory”, Bruno Latour

https://github.com/learning-gardens/cybernetics-club/blob/master/reading/session-4/on_actor-network_theory.pdf

Cognitive Architectures: Models of Perceiving and Interpreting the World

TDT4137: Cognitive Architectures

Fig. 1: The Model Human Processor Perceptual system Cognitive system Motor system $\delta$ - Half time for memory $\mu$…

www.wikipendium.no

Organizations/People

PAIR | People+AI Research Initiative — Google AI

PAIR is devoted to advancing the research and design of people-centric AI systems. We’re interested in the full…

ai.google

Magenta

Magenta is a project devoted to music and art generation with machine intelligence. It is part of TensorFlow, an open…

magenta.tensorflow.org

Data & Society

Data & Society is a research institute focused on the social and cultural issues arising from data-centric…

www.datasociety.net

OpenAI Research

OpenAI is a non-profit AI research company, discovering and enacting the path to safe artificial general intelligence.

openai.com

AI Learning Techniques

Deep Learning Blindspots

In the past decade, machine learning researchers and theorists have created deep learning architectures which seem to…

media.ccc.de

Federated Learning: Collaborative Machine Learning without Centralized Training Data

Standard machine learning approaches require centralizing the training data on one machine or in a datacenter. And…

research.googleblog.com

Speech Recognition and VR - Unity Blog

Speech recognition is useful for VR not only for simulating conversations with AI agents but also for the user to…

blogs.unity3d.com

Deep Learning

The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine…

www.deeplearningbook.org

Understanding Semantic Analysis (And Why This Title is Totally Meta) - Zeta

The purpose of this article is to explain what semantic analysis is, what it means in the context of machine learning…

zetaglobal.com

Documentation + Dev Tools + Educational

Voice input in Unity - Mixed Reality

In this article Unity exposes three ways to add Voice input to your Unity application. With the KeywordRecognizer (one…

docs.microsoft.com

Unity - Scripting API: DictationRecognizer

Unity is the ultimate game development platform. Use Unity to build high-quality 3D and 2D games, deploy them across…

docs.unity3d.com

Google Cloud Speech Recognition [VR\AR\Mobile\Desktop] - Asset Store

Google Cloud Speech Recognition a true cross platform tool for Unity which provides functionality for: * The recording…

assetstore.unity.com

RiTa

Edit description

rednoise.org

Machine Learning for Musicians and Artists | Kadenze

Students will learn fundamental machine learning techniques that can be used to make sense of human gesture, musical…

www.kadenze.com

Wit.ai

Edit description

wit.ai

Dialogflow

A conversational user experience platform.

dialogflow.com

pipmothersill/EmotiveModeler-CAD-tool

EmotiveModeler-CAD-tool - The EmotiveModeler integrates our unconscious understanding and emotive perception of shapes…

github.com

heaversm/unity-osc-receiver

unity-osc-receiver - OSC Receiver for Unity

github.com

cososc/mfcc_extractor

mfcc_extractor - A Max based mfcc extractor to be used with wekinator

github.com

How to manipulate the Shape of a 3D object at runtime - Unity Answers

Unity is the ultimate game development platform. Use Unity to build high-quality 3D and 2D games, deploy them across…

answers.unity.com

Unity - Scripting API: Mesh

var newVertices :Vector3[]; var newUV :Vector2[]; var newTriangles : int[]; function Start () { var mesh :Mesh = new…

docs.unity3d.com