Prototyping with Voice Technology
“Hey Google, please do something”
Learnings from Voice Tech Hackathons
Last year, we started talking about experimenting with voice-technology. We talked a lot about opportunities with different technologies, and in the end, the decision was made to focus on developing our knowledge within the voice-technology spectrum. We arranged a voice-tech hackathon with the purpose of acquiring the necessary knowledge to being able to gain from it in our day-to-day operations. After the hackathon, we were in love. The possibilities were endless. The technology was new.
The knowledge that we had acquired in that short little timespan was used to set the foundations for implementing a working flow from in-app to voice to the backend in order to get a more dynamic response.
We were now wondering how we could use these findings from these few night hacks to benefit our clients. — Where could we implement voice to enhance the experience of their services? Thus, it was time to be creative with this new technology, which perhaps sounds as it would come naturally to us given that we’re in the creative tech business. Anyhow, we started exploring the opportunities around us, twisted and turned on everything that was plausible and talked about stuff that was bordering the impossible. A few concrete things came up.
Voice Tech Prototyping
We started prototyping voice for in-app actions and used an existing product that we had developed in partnership with one of our clients. The first prototype that we made was a media-service that we wanted to go further with and integrate without having the app installed on your phone or watch. E.g. imagine starting listening to a podcast from your Google Home or in the Car (Android Auto/Carplay). However, this was not possible at the time so we had to put this on hold.
“We realized that it could be both easy and difficult to create an agent dependent on the purpose of the agent. What would be the goal of the agent? What should it do? Who would be the target audience? We were faced with challenges that we hadn’t met before.”
— Johannes Svensson, Android Lead Developer
One difficulty that we encountered were varieties in languages and accents, where the agent had difficulty in understanding what some users were actually saying, while another challenge was to (really) understand the sentiment from the user to match it with the correct intent.
Hey, it actually works (and this is how!)!
We realized, that with enough data for machine learning it is quite easy to start building a simple conversation. As an example; One of the powerful features in Dialogflow, a human-computer interaction programme based on natural language conversations, that we use to match sentiments with intents, is the ability to pass important data (e.g. first name, age, place or a number) during the conversation to improve the result. It could even help us guide the conversation. E.g. you ask the agent to order a pizza with extra onions but you forgot to mention the drink, then the agent could potentially ask you about what drink you would want with the pizza before placing the order.
In order to add smartness to these questions, we request information from something called Fulfillment which is code that’s deployed as a webhook that lets your Dialogflow agent call business logic on an intent-by-intent basis. Dialogflow will then send the requests from the user to the fulfillment where you can process the data and make it much more dynamic in the responses. With smartness comes the possibility to personalize the conversation further (if you have set up an account and shared your information). The answers are more personalized, more fluent, and more human.
Being in the creative-tech business means that we continuously explore and innovate with emerging technologies such as voice, so if you have a project in mind that would suit this narrative — please drop us a line. We might be just the right innovation-partner for you.