Part 1: Does Amazon’s Vesta Signal a New Start to Social Robots?

Published in

Social Robots

5 min readJul 18, 2019

Are Social Robots Back?

This publication has been dormant for a long time. Why? Social robots have been dead. Anyone following the industry already knows the sad stories of Jibo, Kuri, and others.

Voicebot.ai, our source for all things related to the voice-first industry, pointed us earlier this week to an article by Bloomberg. Seems Amazon may actually be working on a social robot they plan to bring to market soon.

We heard vague rumors that Amazon might be considering a project to “put eyes on Alexa.” What would that even mean? The Bloomberg scoop suggests the project might be real.

What happened to social robots?

Smart speakers happened.

Leor Grebler had a brilliant idea back in 2012 / 2013 (?) to voice-enable a hockey puck style device so people could send it voice requests. Perhaps great minds think alike, because while Leor was talking about his idea to people and getting only head scratches in response, Amazon was apparently incubating a similar concept.

(Side note: Apart from being at the forefront of the voice industry, Leor has always been a super nice person. He sent me a first generation voice-enabled UBI device free of charge back in 2014, and I experimented with writing stories for the voice assistant to tell in its own voice. That was my first foray into storytelling for voice-enabled talking devices, and while it had limitations, it was amazingly fun and rewarding).

With the limited release of Amazon’s Alexa-enabled Echo device in mid-2015, Amazon quickly realized it had captured lightening in a bottle. (Too bad for Leor, who had his sights on the same incredible bolt. But his startup had no way of competing with the huge commercial and technical muscle of one of the most valuable companies in the world).

Amazon quickly followed the Echo beta release with a no holds barred sales campaign, aimed at getting Amazon Alexa into as many consumer homes as possible. That push continues.

What Did Smart Speakers Train Us to Expect?

With the success of smart speakers, consumers have come to know voice-enabled AI as a ghost in a box. Yes, there is a personality in there; but it isn’t a robot. It doesn’t act like a robot. Instead, it performs the functions of a smart radio. If you get the commands right, it (she) will play your favorite music, tell you the weather, or set a timer.

Alexa was designed around a “request and fulfill” paradigm, with an interaction model managing incoming customer requests, or intents, and a fulfillment engine determining how to perform the action requested.

To communicate with the Alexa voice assistant, a customer can use a small set of fairly intuitive commands or queries — such as, play Baby Shark, what’s the weather?, set a timer for 30 minutes, who won the Super Bowl last year?

To reach more complex features, such as voice apps developed by 3rd parties, a customer has to learn special “invocation phrases” to trigger those services. This “explicit invocation” paradigm is widely seen as undesirable and will presumably be phased out over time.

The latest version of Samsung’s Bixby, built on Viv technology, leverages a different model that doesn’t rely on the customer remembering special invocation phrases. Even Amazon has recently announced new technology that enables Alexa services to be implicitly invoked from inside other services.

What happened next?

Smart displays happened.

In tech parlance, we talk about multimodal devices. We just mean voice-enabled speakers that are integrated into screens.

While voice-first seemed very compelling for a year or so, the platform providers (Amazon, Google) quickly realized the potential of adding displays into the mix.

Smart displays have yet to take off in the same way that relatively inexpensive Echo Dots and Google Minis have. But the cost of voice-enabled screens is coming down and the platform providers seem to think the future lies in multimodal.

What does all this mean for the future of social robots?

Smart speakers have taught us a lot.

It’s bad to over promise. History hasn’t been kind to companies that baited people with videos of an endearingly lifelike robot, but then switched that promise for a buggy piece of hardware that couldn’t do much. Lowering expectations seems to be the way to go.

2. Far-field microphones: We’ve learned that far-field microphone technology is a surprisingly significant factor in the success of anything humans are told they can talk to. If we expect to be able to talk to a device, it darn well better understand what we’re saying. We don’t want to have to stand right over it and yell.

3. Reliability of speech recognition: We now know that people have a low tolerance for being misunderstood. Customers who get high error rates in their initial interactions with voice devices are likely to give up and not return.

4. The current request and fulfill paradigm isn’t working well. People have only very vague notions about what voice assistants can do. The means to discover new features and capabilities are rudimentary and hit or miss. When the best way for customers to find out what their state-of-the-art voice-enabled AI can do, is by reading a weekly newsletter, it’s not a good sign.

5. A human’s desire for conversations can’t be ignored. People want to converse with anything they can talk to. Delivering human-like conversation is the hardest technical challenge of all. Voice assistant platform providers do their best to steer customers away from conversation, nudging them back to the request and fulfill model of interaction. But people are people. They want to talk and feel like they’re being heard, and have the sense that the conversation matters.

6. People don’t want to feel like they’re being spied on. Yes, smart speakers are selling well. But there is a techlash underway and it will impact all things tech, especially technologies that are viewed as invasions to privacy.

There’s a lot more that we’ve learned, and all these learning will influence the future of social robotics, but this post is already way too long. Let’s keep exploring the future of social robots, and what the Amazon announcement might mean, in a second part (probably out in August).

In the meantime, comments welcome! Hope to see some of you at VOICE Summit 2019 next week in Newark.

Top Photo by inanc avadit on Unsplash. Drawings, regrettably, by the author.

Note: Original version edited to include point #1 in last section.