Getting Started with Voice–FAQs from Our Partners

Published in

TribalScale

8 min readDec 18, 2017

By TribalScale’s Voice Experts

Voice platforms are growing exponentially. According to Strategy Analytics: The number of smart speaker sales in 2017 is estimated to be at 24 million devices, jumping from 6 million in 2016.

Amazon recently launched Amazon Alexa and Echo devices in Canada! Since early 2016, TribalScale has been exploring, testing, and developing voice apps for Amazon Alexa enabled products, and we have helped our partners, such as CBC News, build their own Alexa skills ahead of the official launch.

Now that it has finally launched in Canada, there is no better time for brands in Canada to jump on board the voice train. In this post, we’ll be answering the top 15 frequently asked questions (FAQs) from our clients who were interested in voice, or were developing voice projects. Whether you are creating an Alexa Skill or a Google Action, our answers will address both major platforms.

For a full list of FAQs beyond the top 15 listed below, fill out the form here to receive your copy.

First, let’s start with some basic terminology.

Voice User Interface (VUI): A flow chart or diagram of all the possible inputs and outputs between a user and the voice assistant. A VUI allows users to make voice commands to activate responses from voice-enabled devices, such as the Amazon Echo or Google Home.

Amazon Web Services (AWS) Lambda: A compute service that provides on-demand cloud computing, allowing developers to run code for virtually any type of application, and without the need to provision or manage servers.

Voice Integration Dashboard: A platform enabling developers to build and test brand-unique, natural language interactions.

• Amazon Developer Console (ADC) for Amazon Alexa

• Dialogflow for Google Assistant

Wake Word: A phrase that turns on your voice-enabled device.

Invocation: A phrase spoken by the user to open a specific voice app.

Utterance:A phrase spoken by the user that maps to an intent or action by the voice assistant.

One-Shot Utterance: A command or request with complete variables or slot value information to fulfill an intent.
Partial Utterance: A command or request with incomplete variables or slot value information to fulfill an intent.

Slot Value: A variable that relates to an intent allowing the voice assistant to understand the utterance or request.

Intent: The actions a voice app can perform for the user, which are fulfilled based on the user’s spoken request or utterance.

FAQs

1. What is a Voice User Interface (VUI)?

A VUI makes conversation flow and interaction between a user and a voice assistant platform possible. It allows a user to navigate a voice app through speech commands. For comparison, with a typical Graphical User Interface (GUI), a user navigates an app by clicking with a mouse or tapping a selection on a screen.

The VUI defines the capabilities of a voice app by roughly setting all the possible interaction parameters between the user and the voice assistant within the app. At TribalScale, our design team builds visual wireframes that dictate the app flow on mobile or web platforms to allow us to iterate on the user experience. For voice applications we came up with a similar process by building out VUI diagrams of intents.

Here’s an example of a flow for an intent to order milk chocolate:

Possible Partial Utterances:

“Order milk chocolate.”

“I would like to order milk chocolate.”

“I want to order milk chocolate.”

“I want to make an order”

One-Shot Utterance:

“Order {milk chocolate} for {Judith} on {January 2nd}.

OrderIntent VUI flow:

2. How do I test my voice app?

During the engineering phase, our TribalScale teams utilize unit testing. Unit testing is a software development process that we use to test individual units of code in order to ensure it performs correctly with the rest of the codebase. This way, the engineers can test the application as they are developing it.

By using a proxy tool, we are also able to rapidly develop while our engineers test in real-time. Our Agile process enables our QA team to manually test the application, while our engineers produce testable builds on a weekly basis. Staging or development builds can also be accessed on voice-enabled devices (i.e. Amazon Echo) after the appropriate permissions are set.

3. How can I conduct user testing for a voice app?

User testing can be done even before code is written. At TribalScale, we perform usability testing with the VUI flow. This process involves script reading to ensure the user experience is smooth and the voice application’s flow is logical. We also document any utterances and words that were not included in the initial VUI flow.

Our design team creates a controlled test scenario. In one room, we have the user and facilitator, and in a separate room we have the moderator and the observers. The moderator takes on the role of a ‘fake’ voice assistant and has a predefined script based on the VUI. The facilitator asks the user to state certain utterances to the ‘fake’ voice assistant so we can test user interaction with the voice app in a controlled environment.

4. What do I need to be aware of when I want to recreate an existing desktop or mobile app as a voice app?

Similar to moving desktop experiences to mobile, where feature parity is neither a requirement nor a guarantee, moving over to voice will also require product transformation. Users are already familiar with paradigms from VUIs — those we see on our mobile phones, laptops, or tablets. Having a simple and easily digestible feature-set on a familiar device will allow users to navigate your entire application with little difficulty. When moving over to a voice platform, and utilizing a VUI, our design team builds a smooth user experience that is more human-like and conversational.

5. Are notifications and payments available through voice apps?

Notifications are available on Amazon and Google platforms. Payments on Amazon platforms are available through private beta, but TribalScale can work with your team to see if we can give you early access.

6. Can you access all the records of what users say in voice apps?

Yes, Amazon Alexa records and stores all voice requests and responses in the Alexa application. Google Home also stores their data in the Google Home app under ‘My Activity’.

However, Alexa is a bit different from a development standpoint. It’s harder for Alexa to convert all speech to text; everything it can “truly” interpret is based on predefined mappings based on utterances you’ve defined in your VUI, or from the native platform’s built-in utterances (i.e., start, stop, and cancel). Since Alexa does not collect all information said by a user, the privacy and security of what a user says is maintained, even after the skill is activated.

7. Can users link personal accounts (i.e. bank accounts) to voice apps?

Yes, Amazon Alexa allows your voice app to link-up with custom user profiles from your web services. For example, Domino’s Pizza allows users to link their account with their Alexa skill so they can easily reorder or recall past orders.

With Google Home, you can link many different user accounts, which can only be accessed with the owner’s unique voice print. For example, you can ask for your calendar and the Google Home will only list your appointments and events using voice recognition technology. This allows multiple people to use the same Google Home device, while having an added layer of security and personalization.

8. Can you make phone calls through voice-activated devices?

Currently, Amazon Alexa supports Alexa-to-Alexa voice calls, as well as mobile and landline calls to select countries (Canada, U.S., and Mexico). You cannot build a custom application that allows phone calls, but you can ‘hack’ this by using third-party integrations. However, in almost all cases, building in a phone call feature will result in a poorer user experience, and we do not recommend doing so.

Google Home supports outgoing voice calls to any number, but it‘s only available in the U.S. and Canada. Unlike the Alexa-to-Alexa calling feature, you cannot make a direct call to another Google Home. And similar to Alexa, you cannot interact with a voice app while on a call.

9. How does Alexa or Google Assistant handle user errors, insufficient information, and invalid responses?

Both Alexa and Google Assistant have built-in error-handling. If a user asks a question or gives a response that is not relevant or doesn’t exist in your VUI flow, it will automatically give the user an error message. If the user doesn’t provide enough information the voice assistant will ask for specific details until it has all the information needed to fulfill the intent.

10. How long does it take to build a voice app?

The length of time is heavily dependent on the feature-set and complexity of the app you are trying to build. A very simple app can take 2–4 weeks, but it depends on the criteria and complexity of your app. It also depends on external factors; whether you have APIs, an existing backend, third-party integrations, etc.

11. How do I submit my voice application?

Amazon Alexa and Google Home platforms have individual ‘stores’ that house voice apps (skills or actions). At TribalScale, we work with our clients to set-up accounts and walk them through the submission process. Each platform requires a verification process before it is released to the public. For more details, see question 12.

12. What’s the verification process like for a voice app?

With both Amazon and Google, all submissions must be tested before users can access the skill or action to ensure that quality applications are accessible to users. Both Amazon and Google will provide feedback if an application does not meet their standards.

On Amazon Alexa, for a basic skill, this process could take 1–2 weeks. If you utilize Amazon Pay integrations, it could take as long as 3–4 weeks. However, on a case-by-case basis we can work with an Amazon representative to escalate the process and get your app released to market faster.

13. Can I integrate analytics into my voice application?

Yes. Both Amazon and Google have basic analytics that are built-in by default, and at no extra cost. There are other third-party analytics platforms that may provide further insights, including VoiceLabs and Adobe Analytics.

14. How do you maintain/update a voice app?

Voice applications are cloud-hosted. Meaning, they can be updated at any point in time, but first configuration changes — which occur on the Amazon Alexa and Google Home web platforms — require a re-submission through their certification process. For minor changes, the certification process is faster than the initial certification.

15. Can I access voice apps on my mobile devices or desktop?

Amazon only allows Alexa interactions through Alexa-enabled devices. Google allows Actions to be accessed through iOS, Android, and Google Home devices through Google Assistant. However, the experience of Google Assistant on mobile and desktop is much richer than the voice-only Google Action app, as users can also interact with screens on mobile and desktop. For an equivalent experience with mobile and desktop, it’s up to the voice app developer to design and implement it accordingly.

Want to learn more? Fill out the form here to receive the full copy of our FAQs on Voice Guide.

Have a question that we didn’t answer, or want to learn more? Leave us a comment below or contact us at: contact@tribalscale.com.

Join our fast growing Tribe and connect with us on Twitter, LinkedIn & Facebook! Learn more about us on our website.