Alexa, open TED Talks
The adventures and misadventures of teaching Alexa the entire corpus of ideas worth spreading.
You don’t have to know a lot about technology to see the potential in voice platforms. Just put a 7-yr-old in front of an Echo. It’s fascinating to see how natural it is for a child to communicate with a machine, and how easily flustered we all become when the machine doesn’t do exactly as we wish (ahem… first few years of Siri).
Designing for voice is not an exercise of graphic precision. It’s a bit more like linguistic homework. We often forget the nuances of how our brains subconsciously process the spoken language differently than the written language. Artificial intelligence that grew up ingesting the written text alone often falls short of our expectation when it attempts to make a pass at voice. I learned this over a short project where we explored using IBM Watson to determined talk sentiments from transcripts. Watson reported predominantly negative sentiment across majority of talks due to the extensive time most speakers spend setting up their narrative arc.
When we (@andymerryman and I) first embarked on this voice project, the idea was to start small. We wanted to create a way for people to ask for TED Talks, perhaps find talks via a few simple commands, or just build out a daily habit by asking for today’s talk or catch up on yesterday’s. Amazon offers pretty well documented process for building out skills. In a nutshell, we’re simply teaching Alexa what to listen for, and what to do when she hears it.
This is a quick map of how the TED Skill work. It follows Alexa’s voice design best practices. The big leg-work is to anticipate anything someone might say (utterance) > map them all to what to do (intent) > and pull up appropriate matching content (audio file). For example, here I show different words as permutation for utterances.
When there is a match, the intent is fired. When there is no match, @andymerryman directs it to appropriate error flow.
But how can we anticipate everything everyone can possibly say? This is where disappointment comes in. Novice voice users tend to start with something akin to decision paralysis — “what should I ask this thing?” Once we get past the first-impression stage, we tend to fall into one of two categories: we either find ourselves using one single utterance all the time, a la “play today’s talk,” or we attempt something Alexa doesn’t understand one time, become disappointed, and then give up all together. It’s no wonder early skill developers are finding it difficult to get their work regularly used. The challenge with TED Talks is that while we’re totally cool with giving you fresh new talk daily, browsing the archive via voice in any other way can get quite cumbersome.
Take speaker names for example. TED speakers range from easy to pronounce like Ken Robinson to tongue-twister like Chimamanda Ngozi-Adichie. The other day I was looking at our error logs and saw numerous references to the utterance “It’s brown.” It took me a while to guess that maybe this is supposed to be Breńe Brown and Alexa’s possibly choking up on the ń. Alexa currently understands phonemes — pronunciation variants of words — but it doesn’t quite know what to do with foreign speaker names. With over 2000 speaker names, all we can do right now is add the list to the custom slot and, well, pray.
Then there’s the challenges of verbally uttered searches. Somehow when we are presented with a rectangle box, we seem to find a way to formulate our thoughts into more readily indexed terms like keywords or fully-formed sentences. But when it comes to voice, our natural tendency is to have an unstructured phrase-based conversation, like “oh you know a talk about that lady that held a brain on stage..” Nowhere in TED Talk tag system or text-based metadata does this show up. But for those of us who watch an excessive amount of TED talks, we know this is Jill Bolte-Taylor’s talk. How should we index content based on commonly understood summary that can range from visual reference on stage to pop culture anecdote? This is a fascinating problem not specific to Alexa per se but to the larger artificial intelligence + machine learning + natural language processing at large that I hope someone is working diligently on.
More struggle on fuzzy search ensues. Let’s say today you want to hear some funny talks. Everyone loves funny talks. We’ve got lots of funny talks. Do we play the top funny talks right away or do we find matches and give you choices? After much debate we decided to play the best match and hint at how to skip to the next result. You see, on a search bar, I can type something and auto-suggest can give me clue as to what to do next. I can browse many results and scan for hundreds of things that might be what I want. Over voice, you hit the cognitive overload wall at about 3 choices. If your search term is vague and common like “play some funny talks” you are likely to come away with lots of possible results. While we’re able to cruise through and play best matched in this use-case, other use-cases like “find speaker name Mike” creates a lot more complexity. (Please don’t go looking for Mikes. It doesn’t work yet.)
So here we are, having just launched the first version of TED Talks Skill. Give it a try. We’re continuously learning and evolving. We find that the most entertaining and eye-opening feature we built on the skill is the error-exception log. Essentially — a list of what people say that doesn’t result in an intent. We don’t really see this as strictly errors, but rather an invaluable insight into the minds of the mass, and a treasure trove of improvement possibilities.
Here’s a break down of category from 100 errors sampled. Not terribly surprising that people will interrupt with access to other skills midway through any media playback. What’s somewhat surprising is the exception thrown for basic commands like “quit” or “resume” that went unhandled. Also.. jokes! :) Just you wait for the next build.
Huge thanks to the team that made this possible. @andymerryman on code, Saba Dilawari on marketing, Yitao Wang on design production, and the team @ Amazon Jenn & Liz and the +100 Amazonians who helped tested and dog-food’ed the skill prior to launch.
I have a dream one day the TED Talks skill will be an audio journey, an interactive narrative that stirs your curiosity like a choose-your-own-idea-adventure game. We’re still at the very first step towards what’s possible and I’m super excited to see it grow in the future. In the meantime tho, “play me yesterday’s talk” is my jam.