Voice Tech Podcast
Published in

Voice Tech Podcast

What I learned developing my first voice app

I love new technologies. I really love them and I always feel excited when something new is presented.

I live in Italy so Alexa/Google Assistant are still pretty new but I know that the voice-app market is already filled (with mostly crap but still filled at the moment, we will discuss it later).

Anyway, as I was saying I love to understand new technology, play with it and try to figure out what I can do with it (and maybe get some income from it?).

I’m not a designer or a content creator, I’m a full-stack developer and that limits a lot on which type of voice application I can create. Nothing fancy, nothing incredibly new but still something useful build on top of an existing API.

For my first voice app (ok, ok, that’s not true, I’ve already built a Crypto Manager for both Google Assistant and Alexa) I wanted to create something that could be extended to other topics.

Which voice app should I create?

I decided to build an Anime Assistant with some basic features:

  • Search for anime by keywords/names
  • Get an anime overview (title, category, description and so on)
  • Know when an anime will air next
  • Get some suggestions based on an anime (which similar anime can I watch?)
  • Track / Untrack an anime from my calendar
  • Get a daily calendar reminder to know when my favorite anime will air next

I like anime very much, that’s why I wanted to create a skill that would allow me to get these features.

If everything goes well and people use it, I would like to create other voice apps of this type for Manga, TV Series, and Movies. The core part of the voice app/backend service is the same.

So ok the topic has been chosen.

Which technologies should I use?

I’ve my own backend stack but I’m always keen to learn how to use new services and frameworks in order to improve my self and improve the performance of my applications.

Here’s the list of them:

  • Digital Ocean: all my apps are stored on my own server. I would love to use Amazon Lambda but I had not time to learn how to deploy them over there (would you like to give me a hand?)
  • NodeJS: my main development language. Easy to use, powerful and with a huge community around it
  • MongoDB: easy to use, hosted on my own server. I’m planning to learn how to use Amazon DynamoDB (another NoSQL DB). It’s free but it has a learning curve to start using it. It’s on my todo list!
  • Jovo: it’s a nodejs framework that allows you to easily build multimodal apps for Amazon Alexa and Google Assistant. It’s open-source, maintained and core developers are active on their Slack channel. I always like to be part of an open community and contribute to it
  • AniList: an anime/manga API provider from where I get the content from

Building the Alexa skill

I will not cover the Google Assistant part. I will focus only on the Alexa platform for some specific reason:

  • At the moment, in my personal opinion, Alexa is far ahead compared to Google. It has more features and a better development platform
  • I want to talk about APL (Alexa Presentation Language) that Dialogflow does not have

Intents and interaction

So first thing first, Intents. I will not cover the basics, many people have created well-done content about how to develop a skill, this is more about my journey to create a complex interaction skill.

Sixteen Intents. Yes, they are a lot but not really a lot I can assure you!

I’ve 7 basic invocation intents:

  • ShowInfo: it will allow you to get basic information about a specific anime
  • ShowNextEpisode: it allows you to get information about when a specific anime will air next
  • ShowRelated: it allows you to get a list of recommendations starting from a specific anime
  • ShowCalendar: it allows you to get you next week airing calendar
  • ShowFilling: it’s a special intent I use to fill the anime name when I miss that information from the intent. You can think about it as a wildcard.
  • ContextNextEpisode: when you have already selected an anime allows you to get the next airing episode info without specifying the anime name.
  • ContextOverview: when you have already selected an anime allows you to get an overview of it read by Alexa.
  • ContextAddTrack: when you have already selected an anime allows you to add that anime to your tracking list.
  • ContextRemoveTrack: when you have already selected an anime allows you to remove that anime to your tracking list.
  • ContextShowRelated: allows you to get a list of anime recommended starting from the one you selected.
  • ContextShowChoice: allows the user to select an anime from a list just saying the list ordinal number (I will discuss this solution later). This is the best way I found to do that, if you have another workaround please write me a message ;)

Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com

Entities

This specific voice app has only two entities. Those two entities are Built-In and not a custom one.

  • show_name: it’s an AMAZON.TVSeries perfect for what I needed. It always catches (and I’ truly amazed by that!) the anime name, even difficult one or written in Japanese (or at least, the one I tried)
  • show_choice: it’s an AMAZON.Ordinal used to get which item from the list the user has chosen

Now, I will dive deep into a specific intent because I would like to also talk about APL (Alexa Presentation Language). I really loved it and I think it will evolve in something really beautiful. I think that it all depends only on how many devices people will buy the type (screen enabled).

Intent ShowInfo

Let’s start with this intent. As we said before, this intent allows the user to get information for a specific Anime.

So the user usually would invoke it saying:

“Alexa, I would like to know more about {anime_name}”

In a perfect world if the user uses the correct anime name the API would return just one record and I could show just the detail of that anime like this

But we don’t live in a perfect world, so the API returns a lot of results, mostly because Anime has different seasons with different names. I could have filtered them for the running status (don’t show anime that are not airing anymore) but I want the user to be able to look also for old anime.

So when you user ask Anime Helper

“I would like to know more about Sword Art Online”

he gets back a long list of results. This is not an optimal use case. How can I find an easy way to let the user choose the correct one? There are two options here:

  1. The user can say the perfect match name like “Sword Art Online: Alicization”
  2. The user can simply say: “The third one”

I support both the options but I think (still need to validate it) that users will be keen to the second one, much faster and you don’t need to say a complex Japanese name in the full length.

Lists are a huge pain for users (and developers) because voice app are voice-first and voice-only applications. We always need to find a good balance between showing enough results with enough information without overwhelming the user but just enough to let him make a choice.

Have you found a better way to handle this situation? Let me know on my Twitter!

Intent ShowInfo: the APL version

If you didn’t know here’s a brief explanation of APL:

With Alexa Presentation Language (APL), you can create visual experiences to accompany your skill. Users can see and interact with your visuals on supported devices such as the Echo Show, Fire TV, some Fire tablets, and other devices. You can include animations, graphics, images, slideshows, and videos in your visual experience.

This is one of the main differences between Google Assistant and Amazon Alexa and I really love where Alexa is going even if I think that (and there’s no public number to validate it) users still don’t own a lot of touch-enabled device because of the high price. We will see.

Anyway, even for not designers like me, creating a simple APL template is pretty easy, Amazon offer a WYSIWYG tool on your skill builder. You can start from a pre-made template and customize it based on your preferences.

Let’s see the same search query in APL. We were looking for the anime “Sword Art Online”

Visual is better, you can display images, you can re-arrange text and important information and display more metadata like ratings. Also, users can easily scroll through results and click on each item to access to the detail of it. You still need to support voice interaction (voice-first, voice-first, voice-first, this must be your mantra!) but at least you can give to your users another choice to interact with your skill.

So when the users have clicked on a result item (or choose it by voice) you can easily display the detail of it.

At this point, users can interact by voice with the anime selected with this contextual commands:

  • Track it
  • Untrack it
  • Overview
  • Similar anime

Something that I really miss from Google Assistant is the possibility to add “Suggestion Chips” to your app. Please Amazon copy/paste that concept to Alexa!

Where can you try Anime Helper?

What’s next in my projects?

As I said I really love these new technologies and how they could change the way you interact with new interfaces. I feel that we’re still at the start of a journey (yes I know that I’m a little bit too late to join the party, but here I am!)

If this skill will gain some tractions I will try to replicate the same one for Manga, TV Series, and Movies. I always love to keep things in order and scheduled so for me it’s important to have a calendar that I can keep updated daily with things that I can watch at night when I come back home.

But I’m also working at another 2/3 projects in my free time:

  • An escape room framework to both easily build and manage escape room. Its first alpha version is almost finished. The project is split into two different projects, one to handle the user interaction with the escape room and the other one allows us to easily build an escape room with a WYSIWYG web application. I’m looking for content creators for this project, so if you’re interested send me a DM!
  • A Relaxing / Meditation skill that will help users to relax, sleep or meditate. I love to meditate/do Yoga and I would like to explore the Audio Player feature and In Skill Purchase system of both Alexa and Google Assistant.
  • Create more content for the community! I like to share what I learn and which problems I struggled with during my project development. I would like to find some spare time to create more blog post like this or help people on Twitter / Slack (yeah, I’ll be honest I would love to get the Alexa Champion achievement, you always needs to have a goal in mind!)

Something just for you

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store