More Common Voices

Michael Henretty
Mozilla Open Innovation
4 min readJun 7, 2018

Today we are excited to announce that Common Voice, Mozilla’s initiative to crowdsource a large dataset of human voices for use in speech technology, is going multilingual! Thanks to the tremendous efforts from Mozilla’s communities and our deeply engaged language partners you can now donate your voice in German, French and Welsh, and we are working to launch 40+ more as we speak. But this is just the beginning. We want Common Voice to be a tool for any community to make speech technology available in their own language.

Since we launched Common Voice last July, we have collected hundreds of thousands of voice samples in English through our website and iOS app. Last November, we published the first version of the Common Voice dataset. This data has been downloaded thousands of times, and we have seen the data being used in commercial voice products as well as open-source software like Kaldi and our very own speech recognition engine, project Deep Speech.

Up until now, Common Voice has only been available for voice contributions in English. But the goal of Common Voice has always been to support many languages so that we may fulfill our vision of making speech technology more open, accessible, and inclusive for everyone. That is why our main effort these last few months has been around growing and empowering individual language communities to launch Common Voice in their parts of the world, in their local languages and dialects.

In addition to localizing the website, these communities are populating Common Voice with copyright-free sentences for people to read that have the required characteristics for a high quality dataset. They are also helping promote the site in their countries, building a community of contributors, with the goal of growing the total number of hours of data available in each language.

In addition to English, we are now collecting voice samples in French, German and Welsh. And there are already more than 40 other languages on the way — not only big languages like Spanish, Chinese or Russian, but also smaller ones like Frisian, Norwegian or Chuvash. For us, these smaller languages are important because they are often under-served by existing commercial speech recognition services. And so by making this data available, we can empower entrepreneurs and communities to address this gap on their own.

Going multilingual marks a big step for Common Voice and we hope that it’s also a big step for speech technology in general. Democratizing voice technology will not only lower the barrier for global innovation, but also the barrier for access to information. Especially so for people who traditionally have had less of this access — for example, vision impaired, people who never learned to read, children, the elderly and many others.

We are thrilled to see the growing support we are getting to build the world’s largest public, multi-language voice dataset. You can help us grow it right now by donating your voice. You can also use the iOS app. If you would like to help bring Common Voice and speech technology to your language, visit our language page. And if you are part of an organization and have an idea for participating in this project, please get in touch (dchinniah@mozilla.com).

Our Forum gives more details on how to help, as well as being a great place to ask questions and meet the communities.

Special Thanks

We would like to thank our Speech Advisory Group, people who have been expert advisors and contributors to the Common Voice project:

  • Francis Tyers — Assistant Professor of Computational Linguistics at Higher School of Economics in Moscow
  • Gilles Adda — Speech scientist
  • Thomas Griffiths — Digital Services Officer, Office of the Legislative Assembly, Australia
  • Joshua Meyer — PhD candidate in Speech Recognition
  • Delyth Prys — Language technologies at Bangor University research center
  • Dewi Bryn Jones — Language technologies at Bangor University research center
  • Wael Farhan — MS in Machine Learning from UCSD, currently doing research for Arabic NLP at Mawdoo3.com
  • Eren Gölge — Machine learning scientist currently working on TTS for Mozilla
  • Alaa Saade — Senior Machine Learning Scientist @ Snips (Paris)
  • Laurent Besacier — Professor at Université Grenoble Alpes, NLP, speech processing, low resource languages
  • David van Leeuwen — Speech Technologist
  • Benjamin Milde — PhD candidate in NLP/speech processing
  • Shay Palachy — M.Sc. in Computer Science, Lead Data Scientist in a startup

***

Common Voice complements Mozilla’s work in the field of speech recognition, which runs under the project name Deep Speech, an open-source speech recognition engine model that approaches human accuracy, which was released in November 2017. Together with the growing Common Voice dataset we believe this technology can and will enable a wave of innovative products and services, and that it should be available to everyone.

--

--