How to Rise to the Top of Amazon’s Alexa Leaderboard with Feedback Loops

An Alexa Development Champion offers insights and lessons on the importance of using a feedback loop with custom skills

Published in

A Cloud Guru

8 min readJan 2, 2017

Instructions on how to get to use logging to establish a feedback loop that will improve your custom skill

The Amazon Alexa platform has taken the lead in upstart market of interactive voice tools. The number of skills (custom apps) grew by 7x during the last seven months of 2016, then Amazon finished the year by selling 9X the volume over the prior holiday season. I’ve had a front row seat seeing the platform take off, publishing eight skills so far on the platform, and being crowned an Alexa Champion by Amazon.

During Hurricane Matthew last Fall, a skill I created trended to the top of the usage charts, helping thousands of people get timely information as the storm disrupted the Southeastern United States. While that sounds like good timing, the initial version of the skill had been published six months prior, giving an opportunity for me to adjust features based on usage behavior, enabling it to take-off when demand surged.

Alexa skill developers need to establish a feedback loop based on usage patterns

While the voice application market remains novel, existing software engineering practices driven by analytics will dictate which apps make their way to the top vs. languish at the bottom of search results.

If you’re just getting started, here’s an article covering the native Analytics tools that are within the Alexa Skills Kit and Console and don’t require anything to get started.

This is a great way to begin tracking skill usage, but just scratches the surface as it quantifies how many users invoked the skill, and nothing specific of features within it.

In this article, I will give step-by-step instructions on how to get to a granular level using logging to a no-SQL database service, establishing a feedback loop that enables improvement of the skill over time. Given the prospects of voice controlled devices being the next great computing platform, it’s a great area to build expertise and get you to the top of the charts.

Getting Started with 15 LOC

An easy way to add metrics into your Alexa Skill is for each utterance, save pertinent session data for analytic processing. Given that the skills are written using AWS Lambda, the natural choice is to store this session data to a DynamoDB table (see architecture below).

Serverless architecture for Alexa skill analytics

This takes about 15 lines of code and minimal overhead or extra response time experienced by the user. This data can then be queried within the AWS DDB Console, then pipelined out to S3 for large datasets, or dumped to an .csv extract for analytics.

Once you have this loop established, it enables great depth to the voice-interface design and reinforces the “user first” mindset critical to quality application development. Here are a few examples from skills currently live with Alexa. (note: the session detail being stored tracks user interaction with the application, no detail about the user is saved, thus creating privacy issues.)

Track Effectiveness of Quiz or Survey Questions

The first skill I authored with the launch of the Alexa device in the UK is called British History Trivia. Having fond memories of living in the UK earlier in my career, particularly around the wit of my co-workers, I thought a trivia skill would be a hit. An objective that I set out was to make quiz difficult, spending plenty of time researching detailed questions on centuries of history. The challenge is how to quantify what “difficult” means.

Now the ASK and Console can tell us how much traffic the skill is getting, including breakout by country (usage in US vs. UK or Germany), as well as how many invocations per hour and by how many unique users. That’s useful to track given that the UK was a new market, but it doesn’t provide any insight into the quality of the skill that could provide the starting point for a feedback loop.

For example, questions about how difficult the quiz is based on average user score, or the relative complexity of each question depending on how many are guessing each one correctly. By adding callouts within the skill to DDB, I’ve collected this level detailed information, and it provides the foundation needed for establishing a feedback loop. Once this level of granularity is obtained, adjustments can be made to improve the user experience, backed by metrics.

Here’s an example of how it is applied to the British History Quiz.

Success Rate for Quiz Questions asked in British History Quiz

The chart describes in a 4-option, multiple choice scenario, the relative success rate for each question in a quiz being given. The sample size for this is 1000 question responses for a single day. From this data, we could assert that questions 18 & 19 are the easiest, whereas questions 6 & 24 are the most difficult.

Now hopefully I didn’t overdo the difficulty level, but by adding metrics to the skill, I can track how the questions are performing, and potentially target a given success range for user accuracy. This serves as an excellent quality check for the content, and as a way to quantify difficulty vs. solely relying on user feedback in the skill store.

The Value in being Direct

Another skill that I launched over the summer was around US Colonial History. It can read the Bill of Rights as well as Biographies from leading revolutionary figures.

When building a voice user interface for the biographic section of the skill, I modeled the conversational flow to emulate the visual model of the dropdown box, creating customized slots with all of the biographies available, and then offer to read the list. The user can then ask for a specific biography to be read. I also provided the ability for the user to just have Alexa randomly pick one to read (“Read a Biography”).

Voice User Interaction options within Colonial Skill History

This meets the best practices of voice user interface design and provides the most options for the user. I also captured metrics around how it’s getting used, and here are the results.

User behavior measured from December 23–26 of the Colonial History Skill

The majority of requests came through reading a random biography versus using the list. Almost half (47%) of users go directly to reading a biography from the wake message (i.e. “Alexa, ask Colonial History to Read me a Biography) and then another significant (27%) group hears the welcome message, then asks for a biography to be randomly read.

The next most frequent selection is the “Other” category, which includes several other paths, including reading the Bill of Rights and providing information about the original 13 colonies. The drill-down feature for a particular individual is rarely used.

Part of the driver for this path being so popular may be that it is “advertised” as the first option from the Skill store (see below).

Another insight is the importance of being direct in providing information to the user, and not requiring a bunch of options and menus. Perhaps this is the equivalent of requiring “as few of clicks” as possible for providing the user the information that they are looking for. What is certain is that being able to quantify this type of real user behavior is critical to learning and improving the skill over time.

A/B Testing the Welcome Message

In the skills that I’ve written, the most popular request/response is the Welcome message. It takes the most traffic, almost the equivalent to the home page on a Web application.

One of the most recent skills that I’ve released is a Piano Teacher Chatbot. The welcome message has a “hint” dropped in to get the user started. The line says:

Welcome to Music Teacher. <plays a short clip of a song>. If you’re just getting started, say Beginner Note Drills. If you want to learn a song say something like Teach me how to play {songName}.

A/B Test — Change highlighted song in Welcome Message in Music Teacher Skill

Unsurprisingly, the most requested song is whatever gets put in this message, but the recorded data quantifies the linkage. In test A, “Twinkle Twinkle Little Star” was featured (yielding 49% of all requests), and in test B it was switched to “Jingle Bells” (yielding 48% of all requests).

Given that this is in the skill code (i.e. in the lambda function), and not in the configuration file that gets pushed into Alexa, it can be changed dynamically without re-approval of the skill. This enables constant experimentation, and potential optimization based on user behavior.

Make it a goal to add a Feedback Loop in 2017!

It’s great to see the explosion of skills in 2016 for the Alexa platform, expanding by approximately 10x from the beginning of the year to today. In 2017, we can continue to improve what’s out there, adding feedback loops to pull in usage data to drive better results.

If you are interested in developing your own Alexa custom skill, A Cloud Guru has courses available covering all AWS technologies, including Alexa.

Terren Peterson is an experienced technology executive with over twenty years of experience in consulting, start-up, and large corporate environments. He is currently the VP of Cloud Engineering for the Retail and Direct Bank Business at Capital One.

Terren is currently developing interactive voice applications using the Alexa platform. He has created multiple Alexa skills. Most recently, he integrated Alexa Voice Service with a Raspberry Pi to create Roxie, the voice-activated pitching machine that won first place in the Best ASK with Raspberry Pi segment of Alexa’s Internet of Voice Challenge on Hackster.io. Terren is now experimenting with the analytics capabilities of Alexa to understand and improve skill usage.

Terren holds a Bachelor of Science in Electrical Engineering from the University of Illinois at Urbana-Champaign. He was the founder of the Digital Campus Lab for Capital One at the UIUC Research Park, and serves on the board of the Hoeft Technology & Management Program. Terren also holds both Architect and Developer AWS Certifications.