Getting started with Alexa and Alexa Custom Skills

P1. The beauty of Voice-first technology


This blog aims at not only developers but also people who are new and want to know more about Alexa. Therefore, the first part of the blog contains some history and general knowledge about Alexa and other virtual assistants in general. If you want to dive right in a more technical part, feel free to skip the first one. However, is it more fascinating to truly understand how great a disruption could be before figuring out how to give a hand on it? If yes then let’s start by going back time for a few years…

Alexa, tell me about yourself

In late 2014, Amazon introduced to the world a speaker named Echo. Its design at a glance is not really special — a black, tall, cylinder-shaped speaker which is capable of generating a 360-degree sound system, that is. However, what makes Echo to be such an attention magnet in technology world is embedded inside. Its name is Alexa — a voice-based virtual assistant built in the speaker. At that time, a virtual assistant was not really a groundbreaking invention, considering that 3 years before, Apple already shook the world up by showing how Siri — its own virtual assistant-helped users to set alarm, reminder, check weather conditions, stock prices or even navigation.

Alexa, how are your different from other assistants?

At the time of its first release, the biggest difference of Alexa compared to Siri was perhaps the way of invocation. Amazon Echo featured a 7-piece microphone array on top of the machine. This means users can invoke, or “call” Alexa from a certain distance. This might seem not a big deal today, especially if you’ve watch how far Google has came with Duplex. However, at that time when you have to manually hold the Home button on the iPhone to invoke Siri, the way Alexa let users to start an interaction was pretty impressive in a low-price range device.

It has been more than 6 years since average consumers got a chance to truly talk with their devices. The game for voice-first technology has welcome a number of newcomers. Two giants Microsoft and Google joined with Google Assistant and Cortana. Nonetheless, apart from overlapped core functionalities of solving users’ problems through voice-based commands as well as media consumption, each voice-first platform has their own advantages and focuses. Alexa is shifting consumer behavior when shopping online from using text-based search queries to voice-based commands. Grounded on huge ranges of Amazon services, along with gradual supports from consumer goods brands and smart-home manufacturers, Alexa and the voice-first technology in general is expected to fly higher and without a doubt, suck more customer attention into their platforms.

Where is the future of voice-first technology?

Let’s firstly reconsider reasons why we are adopting this kind of unconventional way of interaction with electronic devices. The first and foremost is perhaps our curiosity and excitement for such a new way to interact with computers, which was only available in science fiction before. Next, it is faster, more convenient and more intuitive to solve a certain amount of tasks by just asking an assistant. Last but not the least, considering the amount of data accumulated by tech giants and feed to AI systems, customers could expect better experience over time. The more time they spend on interacting with their devices, the better customization they shall get.

So, with all of these huge potentials, what should we expect to come? Even though the adoption rate is unlikely to rocket over the course of 1 or 2 years, the figure is optimistically forecast to reach 55% of US market by 2022. For Alexa particularly, RBC Capital Market stated in a note that Echo smart speaker can help Amazon reap $10 billion of revenues by 2020. Sources of revenue are categorized into 3 primary tunnels: sales of Alexa-enabled devices, voice-driven shopping sales, platform revenues as Amazon could charge voice-enabled skills providers for prominent places on the platform, and AWS which provide cloud solutions for customers.

It is more beautiful than just selling

Commercial purposes are undoubtedly primary focus of every voice-first platform. However, solely paying attention on boosting sales might make people overlook the core reason voice-enabled systems are designated for — solving user’s problems in a natural, intuitive way — voice. The beauty of voice-first technology, in my point of view, lies on the ever intuitive way it interacts with people.

However, this is just one side of the story

As we have seen, voice-first technology has been showing continuous improvements and therefore, huge potential in the long-run. Nonetheless, recent report on voice assistant from PwC actually shows that while youngsters are leading the trend of adopting voice-enabled applications, they are somehow using virtual assistants less often. This could be explained by a number of reasons including privacy and security, lack of trust from users, limited capabilities of the assistant itself and more importantly, the complex experiences users found when interacting with voice assistants.

So, who are responsible for these? Regarding the ruin of trust, we are perhaps aware of how tech giants manipulate our personal data for commercial purposes. However, whether this negatively impacts on our lives is an objective viewpoint in my opinion. For me personally, and for many others I guess, I am willing to hand my personal data such as email or online activities to get better products/ services and better purchasing experience. If Alexa follows history of my searched items to recommend me a good product at a reasonable price, and more importantly, at a right time, I will undoubtedly consider that suggestion and even buy the product. Next, concerning the abilities to handle user requests, virtual assistants are getting better and better. You know that I’m telling the truth if you heard or watched how Google Duplex uses an almost human-like voice to arrange reservations with a human counterparts.

Finally, there is a big issue that developers need to figure out while developing “skills” for the virtual assistant. How can we make user experience as easy, intuitive and smooth as voice conversation is supposed to be? This might sound simple, however, it truly calls for a deep understanding on not only technical side but also user’s expectation of interacting with a device. In other words, how can we, as developers facilitate user’s journey: from the time they invoke the assistant until they get what they want and more essentially, they feel satisfied with what they get and how they get it. This also provides users more incentives for further interaction. Consequently, the more comfortable users find when interacting with a device, the more frequent and willing they will use it.

Thanks for reading the first part of “Getting started with Alexa and Alexa Custom Skills”. In the next part, I will briefly walk you through how to build a simple Alexa Skill after getting to know Interaction model, Amazon Developer Console and Lambda functions. Stay tuned!

