If we want to do better than Siri, we need the tools to get us there

The future of voice based computing isn’t going to build itself. It’s going to take pioneering designers and developers pushing the boundaries of the latest technologies to create experiences that we come to realize we can’t live without.

Yet, while we have seen tremendous advancements in the power and sophistication of voice enabling technologies: The Amazon Echo, Microsoft’s Language Understanding and Intelligence Service (LUIS), etc, the tools designers and developers can use to take advantage of these technologies are virtually non-existent.

Powerful tools are essential to unlocking the creative potential of voice based applications. Compare where we are today with the state of Virtual Reality, a technology at a similar point of development to voice based computing. In VR we have cameras that can record in 360 degrees and software to stitch them into an immersive environment. We can take photos of real places and turn them into interactive spaces, and we have applications like Tilt Brush and Medium which enable you to create 3D works of art as if they were sitting in front of you, and then have them 3D printed. VR has only the most basic tools, yet with them we are able to unlock creativity to a much larger extent than with voice based computing where these tools do not exist. And, without them, we are left with experiences like Dom, the pizza ordering bot.

An example of a photo-realistic virtual space created using photogrammetry. Image Curtesy RoadtoVr.

If we are going to push the usefulness of voice based applications beyond setting timers and asking for our horoscopes, then we need the tools to help us build them.

The unique challenges of voice computing

Today, the hardest part of building a voice based application is understanding what the user is trying to do. In a graphical user interface, a user clicks or taps the “search” button and you are 100% sure they are searching for the words they entered into the search box. In a conversational user interface they instead may ask “how do I make macaroni and cheese” and we’re maybe 86% sure they meant to search for a recipe called “macaroni and cheese”.

Voice based interfaces rely on machine learning to understand the natural language that we use to express ourselves, and they require a lot of training data to make accurate predictions about the meaning of our words (referred to as “intent”).

Collecting the myriad of ways users might express a given intent is the fundamental challenge in creating a voice based application. However, in order to collect this data you not only need to have a clearly defined set of actions a user can take, you also need to have something for them to interact with so that data collection can take place.

And here is where having the right tools can be extremely helpful.

How do you wireframe a non-visual experience?

image courtesy Martijn van Exel

When we prototype a graphical application, we usually start with rough sketches of how that experience will look, collect feedback from users, and slowly increase the fidelity of the design until there is a picture perfect version which can then be built and released to the public.

Without a prototyping tool, voice based apps have to be built before training data can be collected, making for poor user experiences, not to mention being more expensive and time consuming (it’s generally considered to be poor form to build a website or a mobile app first, before testing it with users).

While voice based apps do not easily lend themselves to being sketched out, we can still build low fidelity prototypes which can inform the design on the final product, and more importantly collect the training data necessary to create a responsive application.

Lowering the barriers of creating voice based apps

In being able to create an interactive prototype earlier in the design process, we can create a more meaningful and responsive application at launch, one that feels much more intelligent than it would otherwise be.

Such a tool also expands the ranks of who can build a voice based application. In the early days only experts in speech recognition and natural language processing could build voice based apps. With the release of Microsoft LUIS and Amazon’s Alexa Skills Kit, general developers could begin to create rudimentary voice experiences. Furthermore, with access to a prototyping tool, designers and even enthusiasts can explore the possibilities of voice based computing.

TinCan.ai is one such tool. With Tin can designers and developers can build a prototype of their app, test it remotely with beta users, and collect the data required to power their app. Once collected, they can then train their NLU service of choice.

The Future

While being able to design and build more responsive apps will go a long way towards building truly useful voice based experiences, it is just a start. There are many more tools and technologies needed before we can hope to create truly smart virtual assistants like Samantha in the movie Her. One of the needed advancements will need to come from stronger Artificial Intelligence, creating a much better understanding of natural language than we have today. Other advancements will need to come in how we track data from specific users, so that the experience can be tailored to their preferences, and finally the ability to use voice will need to expand beyond just our mobile devices and products like the Amazon Echo. As virtual and augmented reality mature, voice needs to be included as a meaningful interface.

Voice based computing is still in its early days, and with the right tools, we are going to see an explosion of creative applications that open up new possibilities for what we can do with computers and how we interact with them.