Do we really need to “talk” to machines?

Stefano Zorzi
7 min readJan 28, 2017

--

“We treat computers like indentured servitude right now, and we need to actually take them as pieces of society and treat them that way.” (Richard Sutton)

Two years ago I wrote a blog post titled “talking to software”. The idea of “conversational commerce” was getting increasingly popular and more and more people were looking at it as new global pattern in human-computer interaction.

Fast forward to today and the notion has gone mainstream. Screening through the multitude of prediction posts for the year that just started it will be difficult to miss a references to voice computing as the next big thing.

There is something I find unsatisfactory about all this enthusiasm. It goes beyond the still primitive intelligence of bots and the resulting clunky conversations. More and more, I have been asking myself whether our infatuation with speech as the ultimate human-machine interaction model isn’t more a projection of our intelligence and, in fact, a limitation that prevents us to explore new possibilities.

Does talking to machine have to look like “talking”?

A primitive mean of communication

Typewriters were (allegedly) made slower to avoid jamming

Human language is capable of carrying large quantity of information, more than any other form of communication observed in other animal species.

While this has allowed humans to dominate the planet, it doesn’t mean we are immune from the constraints imposed by the biology of our body and the physics of our environment[1].

In fact, comparative studies have demonstrated that there is somehow a fixed ratio between the speed and density of different languages [2]. In other words, there is a finite amount of information speech can carry. Translating this finding to a familiar analogy we can say that our language is similar to a QWERTY keyboard, artificially designed to slow us down and prevent “jamming”.

Exploring options

Dr. Louise Banks translates heptapods writing in the movie Arrival, adapted from “Story of your life” by Ted Chiang

For them, speech was a bottleneck because it required that one word follow another sequentially. With writing, on the other hand, every mark on a page was visible simultaneously. Why constrain writing with a glottographic straightjacket, demanding that it be just as sequential as speech? It would never occur to them. (Dr. Louise Banks)

The first realm of possibilities is in the way we communicate. Just like the heptapods in Arrival, computers have no reason to be bound to the same constraints that shape our language. The analogy between the alien way of writing and a computer program is not accidental. Both need to be thought out in their entirety before being communicated. Forcing us to think in advance creates a new set of constraints, in particularly on the sender, but has the obvious benefits of improving clarity and reducing room for misunderstanding. Imagine being able to convey an entire complex thought, at once, to another person.

Moreover, recent developments in computer vision have also widened the scope of possible communication forms. From the “basic” use of QR codes, and now stickers, to the ability to recognise and process everyday objects, machines will soon see and interpret things around them well beyond specific commands. When our own vocabulary is being augmented by the use of memes why shouldn’t computers be allowed to do the same?

There is then the subject of our communication: the “what” we talk about. Most, if not all, examples of voice-based interfaces confine communication to giving instructions or asking questions. But this is just a subset of the many things we tell to (and we learn from) machines.

One of the best examples of human-machine communication is reCAPTCHA. Through massive-scale online collaboration, humans help machines digitising (and then translating) human knowledge. The project exploits the complementarity of skills in humans and computers to achieve an outcome that neither could reach alone. Typing a street number or a blurry word and translating ambiguous sentences is in many ways a more native form of human-machine communication than asking Alexa about the weather.

reCAPTCHA: a native form of human-machine communication

In a similar vein, Ines Montani from explosion.ai has been advocating for a more thoughtful (and designed) approach to how we interact with computers in the context of AI. She focuses on three categories: data collection and training, demonstration and education, debugging and iteration.

Training datasets, in particular, is becoming more and more important. However, we tend to look down at it as a low level task that poorly paid, unmotivated workers need to perform. We would rather impart commands to a machine (our servant) than spend proper resources to train it (our partner).

“Ultimately, what we’re trying to do is have a human teach things to the computer.” (Ines Montani)

If teaching a machine is our goal, is talking really the best way to do that? Ines gives instead the example of creating games as a form of human-machine communication, thus taking the idea of reCAPTCHA one step further.

Thinking more broadly about the ways and whys we communicate with machines should stimulate us to develop better languages, interfaces and syntaxes. It will push us to take into account what’s possible (and even desired) from a machine point of view, instead of focusing on “imparting orders”, which inevitable puts us, and our priorities, at the centre of the communication process.

Beyond anthropomorphism

There is nothing new in our obsession with talking to machines. Since the beginning of our technological dreams, robots have taken human (or humanoid) forms and only recently have we started departing from that vision (even though, in the collective imagination we keep persisting).

Robots: mostly good looking girls

We should try not to fall into the same trap when it comes to communication. Alexa, Siri and Cortana are now cylinders and speakers and basically invisible capabilities embedded in any possible object. Rather than looking at voice as a frontier in technology, we could look at it as a remnant of a primitive approach that we are hopefully overcoming.

So far, I have focused on the limitations that we might be imposing on ourselves by concentrating so much on “talking”. But there is more. Even looking beyond the “personal assistant” there are already plenty of examples where talking to machines is assuming human form, and they don’t seem to be very good.

Take for example automatic (or programmatic) emails. The dozen sales or engagement emails we receive everyday:

“Hey, I am John, founder or xyz, I know what if feels to be a hairdresser, and it would be great to help you etc etc.”

Well, obviously I know it is not John writing to me. No, it is not the “human touch” of a CEO bothering to write all of his customers. By “faking” humanity in a machine (an email dispatcher in this case) we degrade both. I now have no trust in John, he is not sincere and basically I feel “tricked” every time I receive a mail like this. And I also have no respect for his product [3]. The software which I have chosen to use (maybe even to perform a critical task in my life) is not allowed to have voice of its own, to have its own personality, but it is “degraded” to having to talk through its master.

Machines that “pretend” being human end up feeling fake (and even damage our feelings about the humans behind them), they are not likeable. It’s just not who they are. The same happens with the bots we are all building these days. We are all trying hard to give them a personality, to make them feel and talk like humans. The result is clumsy, you almost don’t feel like respecting those bots. I have a feeling that we don’t respect them because we don’t allow them to be who they are. By forcing them to speak in our own language we deny their identity and reduce them to slaves. It is not a surprise that bots are widely abused.

It might be an exaggeration but I cannot help thinking that talking to machines in our language is a form of autocratic behaviour that assumes ontological superiority. If we want to talk machine intelligence seriously we need to start thinking beyond anthropomorphism.

Conclusions

There is an obvious advantage in talking to machine through our own human language: it is easy.

From the early days, advances in speech recognition have been pushed by the the desire to make the power of complex computers accessible to users without burdening them with the need to acquire new knowledge.

Convenience is a powerful force. But it is not convenience that brought us to where we are. I would rather see us pushing our boundaries and start exploring what talking to machine really can bring us.

Notes

[1] Modern linguistic has rejected the strict hypothesis that our way of thinking is shaped by how we speak. Nevertheless, speech it is still the best way we know to transfer our thoughts (our “mentalese)” to other people. This means our thoughts need to bend themselves to the rules of acoustic, the physics of sound waves and the biology of our vocal chords.

[2] “It seems that humans may be naturally and universally self-regulating when it comes to communicating through speech. There is a balance that cannot be disturbed: fast syllables are not allowed to carry too much meaning, and syllables with lots of information must be spoken slowly.” A cross-language perspective on speech information rate

[3] This can also extend to entire platform or networks. John Bortwick noticed recently about political bots: “As people understand that accounts aren’t necessarily human, they will start to trust platforms and networks less.”

--

--

Stefano Zorzi

Partner at Founders.as — building the future and trying to make sense of it. Artifici iucundius pingere est quam pinxisse