How to prepare for the Voice Revolution?

David Biger
Wild Wild Web
Published in
4 min readMay 23, 2018

Last Tuesday 8th May, Google brought together developers around the world for an annual conference focused on exploring the upcoming generation of tech @GoogleIO2018. Among the biggest AI announcements that have been done, an outstanding improvement has been reached and Google Duplex can have real-life conversations with strangers. How is it working and how it could revolutionize the way we do business?

From the Shoreline Amphitheatre in Mountain View (California), CEO Sundar Pichai is betting on AI and research and presented one big new features: “Google Duplex”. This update is going to reinforce Google Assistant and make talking more natural. Among the possibilities, you can now ask Google to make a phone call for a reservation to a restaurant or a hairdresser! The tech giant is clearly taking the lead in the voice service industry and it will be tested this summer on Google’s smart speakers.

A personal secretary in the palm of your hands

How it could upgrade our quality of life? The idea behind Google’s Duplex is to make a lifelike AI that talks like us, react like us and it makes you think that you’re discussing to a real person! In the video from the conference, the AI doesn’t sound at all like a robot or a clear-cut voice that we usually hear out of Siri, Alexa or Cortana. It seems like the future of voice assistants has arrived.
According to Nick Fox (VP of design for Google Assistant): “We don’t want to force people into, this is what an assistant should sound like”.

First thing that can come to our minds is that it raises some practical and ethical questions. How could we differentiate the AI from a human in the conversation? The developers and designers who build AI “have the obligation to disclose to anyone who interacts with it that they’re talking to a machine”, said Paul Saffo (from Stanford University). On social media, many users were concerned about the use of those robots: “Those machines could be used for political purpose and to give voting instructions” (Kay Firth-Butterfield on Twitter).

Google assistant is astoundingly realistic

The Demo from Google IO conference showed a conversation between the AI and a hairdresser employee. Google assistant is astoundingly realistic and is even mumblings “hemm…” when the speaker is checking his agenda. The following discussion sounds so natural that the employee do not even realize that she is speaking to a machine! According to Google, this system is useful to clients because it saves them time, and to small businesses that do not have online booking system. The goal is to help users to complete their tasks.

At the heart of Google Duplex, we found an artificial neural network that has been trained to exchange through the phone base on big data. The calls are cut in different tasks: handle the breaks, interruptions, give details information or sync with the speaker. The AI also adapt to the answers depending on the perceived importance, and the result is simply stunning.

What technology is behind Duplex IA?

How to understand the complexities of human language and draw insights? Google Duplex is a system that understand the nuances of conversation. It brings together natural language understanding, deep learning and text speech:

· Natural Language Understanding (NLU) is also used by IBM to process advanced text analysis. It extracts many data from content (keywords, concepts, relations, etc.) and understand sentiment and emotion. You can find out if the sentiment of an article is positive or negative and gain insights into the emotion the writer is feeling. You can determine where in the article the writer is expressing anger, sadness, fear or joy!

· Deep learning is part of machine learning methods based on learning data representations. Today the power of AI helps computers achieve superhuman capabilities and image recognition. Deep learning lets scientists save our most precious resources by analyzing in one month what used to take 10 years. The devices we use everyday translate even the most complex languages from voice into text and images into words. In 2015, Google’s DeepMind created the program AlphaGo that uses self-learning to beat humans that play the board game Go.

· Text-to-Speech Technology (TTS) is a speech engine used to give out spoken words from your device. For example if I was going to navigate to somewhere using Google Maps, TTC will give out spoken language to tell you where to go. It works with all digital devices (computers, tablets, smartphones). The voice is generated by computer and do not only read texts but also images through the process of scanning and real-time optical character recognition (OCR). This technology can help children in the development of their reading skills.

Well, it seems that future already arrived!

Originally published at wildwildweb.es on May 23, 2018.

--

--