Why Google’s Pixel Buds Can Be Problematic

Yosuke Ushigome
6 min readOct 9, 2017

--

This auto-translating headset might pose a new disparity in communication.

By Yosuke Ushigome

Last week Google has announced various new products — Google Home Max, a home speaker with Google Assistant in; its accompanying device Google Home Mini; Pixelbook, a Chome OS laptop/tablet computer; Pixel 2/XL, the second generation of Google-branded smartphone; Google Clip, an always-on AI-equipped mobile camera that keeps storing best shots of your family (!); and, Google Pixel Buds, their Bluetooth headset with a real-time translation functionality.

Google Pixel Buds

Google Pixel Buds is like Apple’s AirPods in a way that they both work with your smartphone via Bluetooth and enable you to talk to the smartphone’s AI assistant which is Siri if you’re in the Apple world, or Google Assistant if you choose to use this new headset with an Android phone. But what brought an applause in the announcement of Pixel Buds was not its Google Assistant demo, but its real-time translation demo. The video below shows how it works (the translation part starts from around 1:00):

Pixel Buds Real-time Translation Demo

To the left you see a woman who wears Pixel Buds and speaks Swedish, and to the right a man who doesn’t wear Pixel Buds and speaks English. When she speaks to him in Swedish, or to Pixel Buds, the Pixel 2 phone in her hand plays back the English translation of what she just said in Swedish. After the man listens to the English, he answers to it in English. Pixel 2 phone captures his speech, automatically translates into Swedish, and plays back the result through Pixel Buds to the woman. The translation part is, of course, handled by Google’s AI-backed translation engine.

On the stage, this function was advertised as “letting you connect with the world around you in a more natural way” and, as you can see in the video, the audience at the event welcomed it. Having supporting 40 languages including Japanese, my Twitter timeline seemed to be welcoming, too.

Well, while it may be a step closer to the universal translation device which many have been dreaming about, I have a problem in how this product works and being advertised.

It’s about the absence of visible translator.

Let’s start with the situation where we have a human interpreter.

Here we have a Swedish speaker (S), an English speaker (E) and an interpreter who speaks both. Obviously the conversation between S and the interpreter is in Swedish and one between E and the interpreter is in English. With the interpreter in between, S and E are positioned symmetrically. Whenever either of them has a concern about a potential misunderstanding, they can talk to the interpreter in their own language and try to resolve it. All communication that happens here is bilateral.

Today you may witness a situation where the interpreter is replaced with the Google Translator app running on a smartphone. The accuracy of its AI-backed translation is simply impressive (though it has an inherently complex issue within its quality and biases). S and E would put the smartphone on a table between them and keep swapping the app’s target language, or prepare two smartphones for two target languages. They can both see and hear their speech get recognised, translated, and spoken in a machine voice, which is getting more and more natural recently. It’s still a bilateral communication in that both can have an equal access to the interpreter, which is in this case a smartphone app.

Let’s introduce Pixel Buds. S wears Pixel Buds and E doesn’t. In between them there is S’s smartphone, Pixel 2. S can still see and hear her speech get recognised, translated, and spoken, just as you saw in the demo video. But what about E?

In the demo, they played back the English-to-Swedish translation result from loud speakers for the event audience. Thanks to that, we all know exactly when the translation happens and when the playback ends. But this isn’t supposed to happen in real-world situations. The English-to-Swedish translation result can only be heard by S who wears Pixel Buds, and the Pixel 2, which would presumably be showing the voice recognition result of E’s English speech, is in S’s hand and not visible to E. E, and all of us who don’t wear Pixel Buds, can never know when, or if at all, our own speech is recognised, translated, and played back. You see the other person wear Pixel Buds but cannot know if it hears you. Here the communication becomes unilateral. This non-wearer’s experience is problematic.

There isn’t such a problem in situations where communication is bilateral, like with a human interpreter, or with a translator app on a table. The two people can equally see and hear how their speech gets precessed, as well as modify the quality of the translation. However, Pixel Buds non-wearers can have no visibility to this translator — in fact, they have no means to verify the existence of such translator.

Non-wearer’s experience is crucial in communication.

Speaking from my personal experience, in cross-language communication, the moment when you find out the other isn’t understanding what you mean is the most communicative moment. You have to cope with the misunderstanding. And the way it resolves or doesn’t resolve often gives you an idea of how much you and the other understand the language, how focused both are, and sometimes even how respectful both are in the conversation.

In Pixel Buds-enabled communication, this kind of insights cannot be gained by the non-wearer. It is because the non-wearer has no access to the translator which plays an huge role in the most part of the process from one’s speech to the other’s understanding. Moreover, this disparity in accessibility automatically occurs when a Pixel Buds wearer starts talking to a non-wearer. The non-wearer is then forced to have a conversation where a door to understanding the other’s language and his or her cultural background is shut (and might also have to pronounce a voice-recognition-friendly, machine-like language). It might feel like talking to a machine, just like it actually is. This seems to have little regard for non-wearers.

If you limit the scope of the application to the mere exchange of the meaning of speeches, Pixel Buds might well do the job. However, I can’t imagine a situation where I don’t have to care the other in face-to-face, cross-language communication. You can’t make friends with people by depriving them of a chance to understand you — same applies to workplace communication like one with oversea clients. Ordering food in a restaurant or talking to a cab driver in a foreign country, a tourist asking you a question on the street… I tend to believe this type of mundane conversations often gives you a glimpse into another culture. But I also understand that there is a need for real-time translation in such situations, among those who think they’re just annoyances.

Pixel Buds can be a very pragmatic tool in such specific occasions. But is it really “letting you connect with the world around you in a more natural way,” let alone being something that liberates you from learning foreign languages, which is often heard when people talk about the dream of a perfect universal translation machine? What is “natural” about a conversation where you don’t have any clue to how you are understood?

Interestingly, there is another Google product that preceded in spurring a similar discussion around the non-wearers experience. It was Google Glass. Though it was obviously a product whose face-to-face interactions such as aesthetics and privacy concern are crucial, the non-wearer’s experience was entirely neglected. You saw them wear Glass but could never know if it captured you. Eventually those Glassholes disappeared and its consumer-facing project is now shelved.

Pixel Buds and Glass have a lot in common when it comes to non-wearer’s experience design, despite their difference in the type of captured data. In fact, Pixel Buds should consider non-wearer’s experience even more if its real-time translation function claims to be helpful in face-to-face communication, in which non-wearers inevitably play an important role.

At the end of the day, we need to wait until Pixel Buds come out and people start using it before giving any judgement in how useful it becomes or if it will spread widely or not. In the meantime, I hope its UX design team realises that the word user doesn’t only refers to the owner of a product, but also people around it.

About myself

I’m a creative technologist based in London, currently working at Takram.
I demystify emerging technologies through prototyping.

To know more, visit my website.

--

--

Yosuke Ushigome

Creative Technologist in London www.yosukeushigo.me // Future Visions / Technology Prototypes / New Interactions