Sustainable tech development needs local solutions: Voice tech ideation in Kigali
Mozilla and GIZ co-host ideation hackathon in Kigali to create a speech corpus for Kinyarwanda and to lay the foundation for local voice-recognition applications.
Developers, researchers and startups around the globe working on voice-recognition technology face one problem alike: A lack of freely available voice data in their respective language to train AI-powered Speech-to-Text engines.
Although machine-learning algorithms like Mozilla’s Deep Speech are in the public domain, training data is limited. Most of the voice data used by large corporations is not available to the majority of people, expensive to obtain or simply non-existent for languages not globally spread. The innovative potential of this technology is widely untapped. In providing open datasets, we aim to take away the onerous tasks of collecting and annotating data, which eventually reduces one of the main barriers to voice-based technologies and makes front-runner innovations accessible to more entrepreneurs. This is one of the major drivers behind our project Common Voice.
Common Voice is our crowdsourcing initiative and platform to collect and verify voice data and to make it publicly available. But to get more people involved from around the world and to speed up the process of getting to data sets large enough for training purposes, we rely on partners — like-minded commercial and non-commercial organizations with an interest to make technology available and useful to all.
Complementary expertise and shared innovation goals
In GIZ (Deutsche Gesellschaft für Internationale Zusammenarbeit) we are fortunate to have found an ally who, like us, believes that having access to voice data opens up a space for an infinite number of new applications. Voice recognition is well suited to reach people living in oral cultures and those who do not master a widespread language such as English or French. With voice interaction available in their own language we may provide millions of people access to information and ultimately make technology more inclusive.
When we learned about GIZ’s “Team V” which brings together digital enthusiasts from GIZ and Mainlevel Consulting to explore voice interaction and mechanisms for collecting voice data in local languages — an effort supported by GIZ’s internal innovation fund — the opportunity to leverage complementary strengths became just too obvious.
Eventually we started working on a concrete collaboration that would combine Mozilla’s expertise in voice-enabled technology and data collection with GIZ’s immense regional experience and reach working with local organizations, public authorities and private businesses across various sectors. This resulted in an initial hackathon in Kigali, Rwanda, with the goal of unleashing the participants creativity to unlock novel means of collecting speech corpora for Kinyarwanda, a language spoken by at least 12 million people in Rwanda and surrounding regions.
Sustainable technology development needs local solutions
The hackathon took place on 12–13 February at kLab, a local innovation hub supported by the Rwandan government. 40 teams had applied with their novel incentive mechanisms for voice data collection, proving that AI and machine learning are of great interest to the Rwandan tech community. We invited 5 teams with the most promising approaches that took into account local opportunities not foreseen by the Common Voice team.
The event began with a rousing call to action for the participants by Antoine Sebera, Chief Government Innovation Officer of the Rwanda Information Society Association, a governmental agency responsible for putting Rwanda’s ambitious digital strategy into practice. GIZ then outlined the goals and evaluation criteria* of the hackathon, which was critical in setting the direction of the entire process. (*The developed solutions were evaluated against the following criteria: user centricity, incentive mechanism, feasibility, ease-of-use, potential to scale and sustainability.)
Kelly Davis, Head of Mozilla’s Machine Learning Group followed giving an overview of the design and motivations behind Deep Speech and Common Voice, that could quickly be adapted to Kinyarwanda.During the two-day event, the selected teams refined their initial ideas and took them to the street, fine-tuning them through interviews with potential contributors and partners. By visiting universities, language institutions, and even the city’s public transit providers (really!) they put their solutions to the test.
Winner of the hackathon was an idea uniquely Rwandese: With Umuganda 2.0 the team updated the concept of “Umuganda”, a regular national community work day taking place every last Saturday of the month, to the digital age. Building on the Common Voice website, the participants would collect voice data during monthly Umuganda sessions at universities, tech hubs or community spaces. The idea also taps into the language pride of Rwandans. User research led by GIZ with students, help workers and young Rwandan working on language or technology has shown that speaking and preserving Kinyarwanda in a digital context is seen as very important and a great motivation to contribute to the collection of voice data.
For jury members Olaf Seidel, Head of the GIZ project “Digital Solutions for Sustainable Development” in Rwanda, George Roter, Director Mozilla Open Innovation Programs, Kelly Davis, and Gilles Richard Mulihano, Chief Innovation Officer at the local software developer ComzAfrica, the idea also resonated because of its easy scalability throughout Rwanda. Moreover, it could be adapted to other projects and regions relying on collective efforts to build common infrastructures of the digital world — something GIZ is keenly interested in. Umuganda 2.0 shows that we need culturally appropriate solutions to lower barriers to make front-runner innovations accessible to more entrepreneurs.
GIZ and the winning team are now working towards a first real-life test at a local university during next month’s Umuganda on March 30. It is the aim of this session to test if the spirit of Umuganda and the collection of voice data really go well together, what motivates people to take part and how we can make voice data collection during the community event fun and interesting. And last but not least, how many hours of voice data can be collected during such an event to determine if the outcome justifies the means.
GIZ, with its deep connections to local communities in numerous countries, was a perfect partner for Mozilla in this endeavor, and we hope to — in fact look forward to — repeat this success elsewhere. In a long-term vision, Mozilla and GIZ aim to continue this promising cooperation building on our shared visions and objectives for a positive digital future. Allowing access to a wide range of services no matter which language you speak, is no doubt a powerful first step.
Alex Klepel, Kelly Davis (Mozilla) and Lea Gimpel (GIZ)