6 Questions for Ahmed Bouzid

Dain Fitzgerald

Published in

Voice Tech Podcast

6 min readJun 6, 2019

The CEO of Witlingo shares with us the #VoiceFirst voices in his head…

1. Hi Ahmed. Thanks for taking the time to chat with me (in text form, lol). You’re the founder and CEO of Witlingo, which furnishes #voicefirst solutions for businesses and organizations, including political ones, I was surprised to discover. What high-level best practices with regard to #voicefirst engagement have you picked up and can share with us?

Ahmed Bouzid: Hi Dain. Thanks for inviting me to chat!

Regarding best practices, if I were to pick one best practice above all others, it would be the imperative of brevity. If the voice first experience you are delivering is one where the delivery of information is the key value, then get to the point quickly. Don’t ramble on with long greetings and elaborate responses. In the world of voice first, patience gets easily taxed, so you need to always keep that in mind.

2. What do you think the biggest difference is between traditional copyright and copyright for voice-based experiences?

Ahmed Bouzid: Interesting question. I think in terms of the basic principles guiding copyright, there shouldn’t be much of a difference. One has created a thing that expresses something, and so the same copyright principles should apply. Where things get interesting is the fact that voice is a layer on top of language. So what is one copyrighting: the language (what is said) or the rendering of that language (how it is said)? For instance, say I publish a podcast. What is copyrighted? The content the way it is rendered or the underlying content? From where I stand now, it feels like I own both the language and its rendering, so that I could license the audio (the content as rendered) or I could license the content only (for instance, a transcription of the content). As the voice first space evolves, we will see interesting new problems emerge when it comes to copyrighting.

3. Does voice lend itself to some industries and spaces more readily than others in your experience? E-commerce has been held back to a large degree by the lack of a visual interface for voice (with the exception of e.g. the Echo Show), but something like interactive podcasts is perfectly poised to take advantage of voice. Thoughts?

Ahmed Bouzid: Yes, it’s all about the use case, isn’t it? Where the user’s hands and/or eyes are busy (for instance, while preparing food, or glueing together a model airplane, or driving), voice first (in this case, voice only, actually) is going to be the best medium for consuming information or entertainment. Not a poor substitute to any other medium, but the best medium, period. In scenarios where you want to communicate without making noise, texting is going to be the best medium (as in a meeting, for instance). So, my point is that it’s not about industries or verticals as such, but about the user’s use case [emphasis added].

Having said that, I think in the case of purchasing, a multi-modal experience is going to be best: you can navigate and ask for things by just speaking, that way you can keep your thinking flowing, and then see something visual with something spoken back in response to your spoken request. In the case where you are listening to podcasts, being able to do things by just speaking and not needing to look at anything or touch anything is the way to go: increase volume, pause, go back 10 seconds, skip 10 seconds, start over, etc. Where the voice itself — the voice of your parents, your grandparents, your children, your siblings, your friends, etc. — is of value, the sound itself is going to be of value. I always go back to the use case. If you are designing for voice first, do your homework and design for the target users and their primary use cases and pick the best media mix to deliver the best possible experience. Let technology follow, not lead.

Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com

4. I mentioned #voicefirst for politics before. Can you tell us about this experience, working with political organizations and individuals? I’ve heard little about voice applied to this particular use case. Sounds interesting!

Ahmed Bouzid: Politics is all about messaging, action, and personality, so voice first is going to be an important medium for politicians to effectively reach their supporters and their constituency. We have launched Alexa skills and Google actions for political campaigns where the candidates are able to keep their supporters appraised of things like their next rallies, meetings, media appearances, fund raising, etc. Voice is also great for being able to get the actual voice of the candidate speaking to you, giving the user a sense of their personality. In fact, that got us to launch a new offering called Voice First Avatar. Check it out at www.voicefirstavatar.com.

5. Bixby, Samsung’s voice AI, is a Witlingo platform partner. Bixby is later to the #voicefirst game than Siri, Google Assistant or Alexa, forcing it to play catch up in both name recognition and integration with third party developers. What advantages might Bixby have (such as in IoT, given Samsung’s edge in TV and appliances) in your estimation to offset this?

Ahmed Bouzid: Bixby is indeed coming late to the dance, but they do have a major hardware footprint and so the potential of becoming a major player in voice first is real with them, with the idea being to voice enable their appliances so that users are able to interact with those appliances by just speaking. Imagine a world where you didn’t have to squint to read the cryptic label on a tiny button and then look up a manual to figure out what it meant, but instead just said something like “Do a bulk wash” or “Steam the rice”? That would be a better world. And so Bixby has a real chance to go where even an Amazon or a Google can’t go. But it depends on execution. They have to deliver great experiences and align their developer partner strategy with their core competency differentiator. And from early indications, I’d say that they are doing great and seem to have hunkered down for the long game, which is the smart thing to do.

6. What is the future of #voicefirst in your opinion? Where are things bright, and perhaps not so bright?

Ahmed Bouzid: I think the next big thing in this current big thing of voice first is increasing the circle of participation. The core compelling thing about voice is that more people than ever can participate, and participate with minimal effort. If you can speak and you can hear, then you can use a smart speaker. I think the next thing is having lay people create content and contribute their creativity and their knowledge and experiences with minimal effort. Imagine you want to listen to some authentic Irish limericks spoken to you by native Irish speakers. Wouldn’t it be great to just say, “I want to listen to some Irish limericks” and then hear back limericks by people with an Irish accent, posted by regular lovers of limericks? So, I think that’s where we are going next: the emergence of a voice first web created by regular people, the way Wikipedia was created by regular people. Exciting times ahead….

6 Questions for Ahmed Bouzid

2. What do you think the biggest difference is between traditional copyright and copyright for voice-based experiences?

4. I mentioned #voicefirst for politics before. Can you tell us about this experience, working with political organizations and individuals? I’ve heard little about voice applied to this particular use case. Sounds interesting!

6. What is the future of #voicefirst in your opinion? Where are things bright, and perhaps not so bright?

Written by Dain Fitzgerald