Jakub Jedryszek, a software engineer on the Azure Portal, talks about his side project voiceCmdr as well as late-night Coca-Cola and ninja stickers.
What do you currently do at Microsoft?
I’m a software engineer on the Azure Portal. I work on the front-end and, more precisely, work on the Azure Portal Framework on top of which other teams build extensions that are Azure features. For example, websites, virtual machines, SQL and so on.
How did you get into open source?
It just happened — I mean when I do something after work, I just put [my projects] on GitHub because why not? And then maybe somebody will find it useful, maybe not. So that’s sort of how I got started. These days, most successful projects are open source. Everybody is on GitHub — there’s a great community there. So we can find a lot of useful stuff out there.
What do you like about open source?
The fact that we don’t have to reinvent the wheel all over again. Like when somebody does something once, we don’t have to do it over and over again. Back in the day, people were implementing sort algorithms all over before they were implemented in standard libraries, for example. I think that was a waste of time because there are great experts in sorting, and they could have just implemented it once. So I think that’s the main advantage of open source. It also helps us to innovate faster because it helps to share knowledge across people. The third reason is to be able to connect across people that you probably would never meet if it weren’t for the projects out there.
Let’s talk about your project, voiceCmdr. What the story behind it?
We get a lot of feedback that it’s hard to find things in the Azure Portal. So one time I joked that maybe we should add voice commands so people can just talk to the portal and the portal can give them what they want. I decided to check if this was possible or not, then realized that Google Hangouts works in the browser and thought that this project should be possible. I found the Web Speech API and discovered that it’s available in Chrome. It is a browser API that does not require any libraries. Hopefully it’s coming soon to Edge — they have it in the backlog now because of UserVoice, where you can vote for feature. Half a year ago, for instance, this feature only had a few votes but now it has over a thousand. Anyway, I prototyped how the voice commands could look like. I thought that using the row of web speech API in the browser would be painful, so I decided that I would just create this library that would make it easier to use — specifically for voice commands. The sample can be found on this website and the code is also on GitHub.
How long did it take to get the project up and running?
A few nights.
It’s not necessarily rocket science. It’s a simple wrapper that’s using this API. What took me a lot of time was capturing the behavior of web speech API, which was a bit painful to figure out because it’s not very well documented and it is still experimental technology.
How does this work?
For example, I discovered that if you run the speech recognition in two tabs, it just crashes — and stuff like that. So those are small things, but now once you use voiceCmdr, you don’t have worry about these bugs. You just do voiceCmdr.addCommand, provide a string as a first parameter (e.g. “hello” or “home), and a callback function as a second parameter that does whatever you ant to do. Usually, to navigate to a different page.
So it’s out there on GitHub… are you still working on it?
To be honest, since I put it there I’ve only made a few small fixes here and there. The WebSpeech API seems to be very stable — they didn’t introduce any breaking changes that would affect voiceCmdr. Now I am waiting until Microsoft Edge will add it. And from there, I’ll be able to write jQuery for the web speech API. There are a lot of other opportunities to extend it. For example, you can add language support because for now there is only support for English, which is the default language. But for now it’s a very simple library with less than two hundred lines of code. So even if people don’t want to use it, they can just copy and paste the code and use it in their projects.
What are some good use cases for voice commands?
So there’s Cortana on Windows… so you can talk to Cortana and Cortana navigates for you. I think that would be a very good use case. And sometimes you have search, but sometimes search doesn’t allow you to type “profile settings” or something else you want to find. And with the voice commands, you can add a little bit of intelligence to that. It would be pretty cool if you could add some machine learning to analyze commands from users, and guess what they are looking for. I believe would be sort of like Google or Bing search, but a little bit different because we usually talk differently from how we write. Voice commands would also be good for people with disabilities — for people who might be blind or vision-impaired. This could help them a lot.
Favorite Late-Night Coding Snack: Coca-Cola? I usually drink coke when coding late at night.
Favorite Swag: Ninjacat stickers!
Role Model: Steve Sanderson. He created Knockout.js and used to work with us on the Azure Portal. He’s an outstanding developer, awesome technical speaker and has great communication skills (which is not common among developers).