One Voice, One Internet !

Pratik Parmar
Aug 19 · 4 min read

I believe, everyone must have tried Speech To Text (STT) systems at least once in their lifetime, and most of us failed miserably; right?

Frustration after trying so hard to get the sentence corrected

However, how many of us ever wondered how this amazing piece of technology actually works under the hood? Moreover, why it doesn’t work properly with Indian accent? Let’s dive deeper today!

How Speech To Text system works?

Typically, most STT systems record the audio from the local device and send it to the cloud, where Machine Learning models analyze the audio and generate the text; which is sent back to the local device where it can be used for many tasks such as accessing Google assistant to playing your your favorite PewDiePie (and don’t forget to subscribe to PewDiePie). STT APIs are provided by Google Cloud, Azure, AWS, IBM Watson, Speechmatics etc.

Then why Project Common Voice?

  • Privacy

As it’s out of scope for this article, hence we're not going to discuss about how machine learning model works here. But you should know that audio sent from your device to the cloud stays there forever, which is making your STT system more accurate.

Most users don’t care about it since they’re getting free services but there are number of evidence can be found where STT systems is recording the audio even when it’s not being used, if you’re a company which cares about your customer’s privacy, you’d definitely like to avoid that.

  • Internet Connectivity

Even though we’re living in an era where 5G technology is knocking our doors, there is 52% of world’s population which doesn’t have access to internet, mainly due to “lack of infrastructure”. So, there should be a way to avail cutting edge technology offline as well.

Project Common Voice by Mozilla

Project Common Voice

Common vice is a project by Mozilla to help machine learn how humans speak, with variation in accent to variation of languages.

Mozilla is already working on another project called “DeepSpeech”, where we’re creating an open source Speech-To-Text engine, more importantly this engine works offline as well.

Working of Speech to Text

There’s a roadblock though, training a Speech-To-Text engine requires huge amount of voice data, along with it’s text annotation. Even if we’re just talking about English, there’s 160 distinct dialects of English throughout the world. That’s where Common Voice comes into picture, where anyone can donate their voice or even just validate the small audio clips (1–3 seconds). To make it accessible to everyone, we’ve launched 35 languages on Common Voice portal, as of now.

How you can contribute?

Ways to contribute
  • Speak: If you’d like to donate your voice (one-two sentence a clip only), just click the microphone icon and speak the given text (don’t forget to give permission to microphone, if prompted).
Contribute: Speak
  • Listen: Listen to the audio by clicking the play icon and validate that given text is correct or not. Click ‘Yes’ for correct or vice versa. While Listen, you should also consider that background noise should not much plus pronunciation of words are correct.
Contribute: Listen

Congratulations !

You just made a contribution to open source, you may pat your back. All of the recorded voice data is publicly available for free. If you’re a developer, who’d like to use this data, just go to this link and you can download the voice data for your own speech recognition projects, don’t forget to share it with community 😉.

Common Voice Night

We at Mozilla Gujarat frequently host night long meetups where we contribute to Common Voice for straight 8 hours, isn’t that amazing!



August 17, 2019 2200— August 18, 2019 0600


Alka Society, Vadodara


About me

If you like my work, please follow me on Medium @Pratik Parmar or add me on LinkedIn. Feel free to reach out to me on Twitter or comment down below, in case you need any help.

Apart from Open-source contributions at Mozilla, I’m a Microsoft Student Partner and community member at GDG Baroda. I would like to thank Mozilla and the MozillaIN community for providing me a chance and the resources to learn about VR/AR and Open Source.

This is me, Pratik Parmar signing off till the next tech adventure. Over and Out…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade