Marshall McLuhan, meet Alexa …
Or: Our Complex Relationship to Speaking AI Software Is Going to Change Us!
This is the second article of a series which explores human relationships with non-living objects, through the perspective of voice assistants
Part 2 of 6, read part 1 here
In 1999, I was still getting my MFA in Design and Technology when The Medium is the Massage by Marshall McLuhan was mandatory reading. To be honest, at the time I didn’t understand everything. But one message I recalled time and again throughout all of my digital career: every new medium is profoundly going to change us in ways that we find impossible to predict (“I beg to differ on the latter”, said Cambridge Analytica when submitting their app to Facebook).
McLuhan also said we try to understand a new medium by “living in the rear-view mirror”, which to my interpretation means that we are viewing changes in the present with the eyes of the past. We convince ourselves that we grasp what speech assistants are: We might think that Alexa is just like using a computer by voice commands or it is just like a radio — a speaker box that snatches content out of thin air on demand.
“The medium does things to people. And they’re always completely unaware of this. They don’t really notice the new medium that is roughing them up. They think of the old medium because the old medium is always the content of the new medium. As movies are tend to be the content of TV, and as books used to be the content — novels used to be the content — of movies.”
— Marshall McLuhan
A quick look into the Alexa Skills offering confirms the rear-view mirror idea. Alexa Skills are currently mostly “speechified” versions of games, apps or content that we already know from other mediums. They range from quiz games to fart generators (very popular) or skills that simply play pre-recorded audio files like ocean sounds or news.
But our relationship to speech assistants is not about the content itself, it is about the way it is going to change us. This is about our complex relationship with speaking AI software.
At our first encounter, I remember thinking that I knew precisely what talking to “Alexa” would feel like. I imagined it to be like Apple’s speech assistant “Siri”, which I had used sporadically for a couple of years.
Siri, particularly German Siri, had often annoyed me: there were frequent problems with insufficient speech recognition and the answers displayed on the iPhone’s screen distracted me from what I was doing (like driving my car). In that sense, “Alexa” was going to be my first ear-only computing experience. No screen, just a plastic box with an attentive ear (mic) and a mouth (speaker).
After the initial setup of the Echo device I leaned (unnecessarily close) towards the speaker: “Alexa, what time is it?” I asked timidly.
“Her” smooth as a pebble voice answered instantly. First contact.
Not being able to rely on my eyes, I was posing questions into a darkness, a space without borders or horizon. This rudimentary dialog effectively happened inside my head — not on a screen. I was talking to a computer!
Although there was this physical speaker in my room, Alexa’s voice seemed to come from beyond. “She” was a mythical voice in the cloud, an otherworldly oracle.
Since then, the magic effects of these first tender moments have worn off quite a bit. My infatuation gave way to more critical inquiry.
Navigating Auditory Space in the Dark
I started my research with the Deutsche Bahn (German train system) skill.
“Alexa, ask Deutsche Bahn how to get from Hamburg to Lübeck on January 15th so I will get there before 10:00 am.”
The skill didn’t understand that I wanted to arrive before 10:00 am and picked a train to depart at 10:00 am instead. Not bad, and yet slightly dissonant conversation nonetheless…
I noticed how in anticipation of using the skill I had adjusted my behavior. Instead of talking naturally I had formed a sentence that included date, place of departure, destination and time.
I do the same thing with Siri: Once I notice inadequacies, I adjust — dumb down — my approach. Am I avoiding the disappointment of a failed “relationship”?
Next, I tried shopping. How do you shop for items that you can’t see?
“Alexa, buy a clothes drying rack.” (Don’t ask…)
“Leifheit floor standing dryer Pegasus 150 Solid Slim, stable laundry stand with wings also for long garments, particularly narrow wing tumble dryer fits through narrow doors. Would-you-like-to-buy-it?”
Alexa literally read the laundry list to me or whatever words an Amazon marketplace seller thought to put into the product’s title to come out first. This was not working. Logorrhea and search-word-stuffing clearly were a turn-off to my mind’s ear.
Was Alexa’s recommendation of this particular item an acoustic equivalent of the top position of a Google Search? “If so, who paid for that spot?” — I wondered with a growing mistrust towards this endorsement. The next interaction did not help: after rejecting the first item she recommended a second drying rack which cost one third of the first one.😡
”What else was on that list that I would never hear about?” — I thought (and DID NOT buy a clothes rack).
I am in darkness here! I want my eyes back. All of a sudden I realised that instead of a digital friend looking out for my best interests I merely am speaking to speech recognition software owned by the largest public company in the world. The magic (and trust) dissolved back into the space it came from.
Alexa, What Does My Voice Tell You?
I do have trust issues but I also imagine the tremendous potential. For now, her AI brain might not seem that smart. But that will change. Fast.
To become smarter (machine learning) AI software needs to be able to analyse loads of data from people like you and me. As we post, comment, like, shop, watch, talk, read, measure heartbeats per minute, count steps and burned calories we leave a trove of valuable data to mine.
Patterns might emerge that humans would not have been able to identify before. At least not in the same amount of time. The question is, what patterns reside inside of voice data?
It’s exciting and scary to think about that. For example, will it be possible to identify diseases from analysing our sound waves? There are quite a few initiatives for that. For example, the Parkinsons Voice Initiative aims to use speech signal algorithms to diagnose Parkinson’s disease.
Maybe there is a distinct wave pattern in the voice of people who are about to suffer a heart attack?
What about emotional diagnostics? Anybody, who tried to sing or speak in front of an audience will know what anxiety does to the voice. So I have no doubt voice betrays our emotional states. What pitiful vibrations did I produce when I begged my lover to rethink his decision to leave me? How do I sound when I am tired and miserable because a cold clogs up my head and I couldn’t sleep?
„Alexa, *cough* I’m hungry *sniffle*”
The imagination starts running wild when thinking about the kind of great and frightening ways that voice interaction can change us. My emotions are seriously mixed on this. My gut tells me to be careful. Simultaneously my designer heart wants to give in to curiosity and the opportunity to build compelling products.
As she finishes reading the draft of this article for the first time, Karile turns to ask me “what do you want this relationship to be”? As I tell her of a vision how future Alexa (or more safe variants of talking AIs) could progressively become my close confidante, my empathic medical advisor or a pal to laugh and cry with, I realize that the answer I just discussed in this article is, in fact, a question:
What do you, dear reader, want this relationship to be?
About me: Dancer | Human Centered Designer| Expert speaker on empathy and design | www.wunschfeld.net
Together with Karile Klug and Stefan Kollmeier I organise a 2 day training into voice assistant application design. Our workshop is driven by observation, that designing for convincing voice-first experiences is closer to the practices of creating theatre plays or movie scripts than it is to the usual user interface design process.