The Hardest Problem With Alexa Skills

Vasili Shynkarenka
4 min readDec 5, 2017

--

Voice has been around for million years. When our ancestors invented the first language, they couldn’t imagine how powerful it would become. With voice, you could both discuss war plans and express your love to kids. You could do all sort of things.

One of the first languages.

However, when computers showed up, we ran away from it. We started typing strange symbols and called them the programming language. And it was great because we were able to tell machines what to do and how to do it.

Later, with GUI revolution, we simplified things. We thought we did. We invented mouse and keyboard and started giving commands to the computer in a new way — through clicking elements on a screen. It wasn’t as flexible as programming was, but it was neat and open to more people, even non-techies.

With mobile, things became even more straightforward. Apple made a huge leap forward regarding the UI; they made it simple and easy-to-use for everyone, from a 7-year-old boy to grandma.

Even grandma can use iPhone.

Now we’re standing at the edge of another interface revolution. We’re almost able to control everything by the sound of our voice. It’s interesting because it seems like we’re coming back to the beginnings. But that’s not true.

We use everything we learned and developed to empower voice interfaces.

All the software that has been around for 30 years can be controlled by the sound of our voice. And it’s not only about the software, we see voice interfaces in our cars, hotels, workplaces, and, most importantly, in our homes. Controlling all these things through our voice is super convenient, and nobody disagrees with that.

The problem is we don’t know what we can do with it.

It’s like making friends. Do you remember that moment when you’re meeting a new person? You cautiously ask them a few questions, trying to guess what they’re about and how smart they are.

The thing is, when you talk to humans, you already know something about them. You see whether they understand your language or not. You have some pre-defined social norms. If you’re from the same country, you probably have the same traditions how to greet each other.

When you talk to your Amazon Echo, it’s different. You know nothing about it, and you probably have no previous experience in communicating with this kind of thing. What you hear is all there is. And you don’t know what it can do.

The answer seems simple: you can just ask. But if you have no information and context, what will you ask?

The way we solve that problem in person-to-person communication is simple. We just talk to each other and discuss things. We don’t ask what other human beings can do for us, we ask them about their personality, their hobbies, and things they like.

We are not utilizing humans that much as we do with machines.

And that’s fine because machines were created to serve humans. In most situations, we don’t need something specific from the person we’re talking to. In most cases, we’re not giving commands. We talk about human things.

When you design voice applications, you’re trying to figure out the job your user is hiring your product for (read more about JTBD in voice here). But you have to keep in mind that your high-level task is to use that foundation that voice platforms provide you with to stay personal.

I think we’ll see a massive leap in Alexa skills adoption only when platforms will become mature enough to provide us with that core foundation. When natural language understanding will be sufficient to guess the right answer in 99% of situations. When the algorithms will be able to understand every request correctly and suggest different apps that can help. Until those times, we need to be more flexible, add more synonyms and workarounds.

That’s why voice design is so hard. You’re not designing voice; you’re designing humans.

Thanks for reading! 🙌

Let’s keep the conversation going, you can connect with me on Facebook or join our community for Alexa skills developers.

--

--

Vasili Shynkarenka

Builder, athlete, YC alum. If I lived in 1492, I’d be the first to join Columbus on his quest.