At Mozilla, our Open Innovation team is driven by the guiding principle of being Open by Design. We are intentionally designing how we work with external collaborators and contributors — both at the individual and organizational level — for the greatest impact and shared value. This includes foundational strategic questions from business objectives to licensing through to overall project governance. But importantly, it also applies to how we design experiences for our communities. Including how we think about creating interactions, from onboarding to contribution.
In a series of articles we will share deeper insight as to why, and how, we’re applying experience design practices throughout our open innovation projects. It is our goal in sharing these learnings that further Open Source projects and experiments may benefit from their application and a holistic Service Design approach. As a relevant example, throughout the series, we’ll point often to the Common Voice project where we’ve enabled these practices from its inception.
Starting with a Question
What is now Common Voice, an multi-language voice collection experience, started merely as an identified need. Since early 2016 Mozilla’s Machine Learning Group has been working on an Open Source speech recognition engine and model, project “Deep Speech”. Any high quality speech-to-text engines require thousands of hours of voice data to train them, but publicly available voice data is very limited and the cost of commercial datasets is exorbitant. This prompted the question, how might we collect large quantities of voice data for Open Source machine learning?
We hypothesized that creating an Open Source voice dataset could lead to more diverse and accurate machine learning capabilities. But how to do this? The best way to ideate and capture multiple potential solutions is to leverage some additional minds and organize a design sprint. In the case of Common Voice our team gathered in Taipei to lead a group of Mozilla community members through various design thinking exercises. Multiple ideas emerged around crowdsourcing voice data and ultimately resulted in testable paper prototypes.
Engaging with Actual Humans
At this point we could have gone immediately to a build phase, and may have in the past. However we chose to pursue further human interaction, by engaging people via in person feedback. The purpose of this human-centered research being to both understand what ideas resonated with people and to narrow in on what design concepts we should move forward with. Our test audience consisted of the people we hoped to ultimately engage with our data collection efforts — everyday internet citizens. We tested concepts by taking to the streets of Taipei and utilizing guerilla research methods. These concepts were quite varied and included everything from a voice-only dating app to a simple sentence read back mechanism.
We went into this research phase fully expecting the more robust app concepts to win out. Our strongly held belief was that people wanted to be entertained or needed an ulterior motive in order to facilitate this level of voice data collection. What resulted was surprisingly intriguing (and heartening): it was the experience of voice donation itself that resonated most with people. Instead of using a shiny app that collects data as a side-effect to its main features, people were more interested in the voice data problem itself and wanted to help. People desired to understand more about why we were doing this type of voice collection at all. This research showed us that our initial assumptions about the need to build an app were wrong. Our team had to let go of their first ideas in order to make way for something more human-centered, resonant and effective.
This is why we built Common Voice. To tell the story of voice data and how it relates to the need for diversity and inclusivity in speech technology. To better enable this storytelling, we created a robot that users on our website would “teach” to understand human speech by speaking to it through reading sentences. This interaction model has proved effective and has already evolved significantly. The robot is still a mainstay, but the focus has shifted. True to experience design practices, we are consistently iterating, currently with a focus on building the largest multi-language voice dataset to date.
As we continue our series we’ll break down the subsequent phases of our Common Voice work. Highlighting where we put into action our experience design practice of prototyping with intention. We’ll take learnings from the human interaction research and walk through how the project has moved from an early MVP prototype to its current multi-language contribution model, all with the help of our brilliant communities.
If you’d like to learn more in the meantime, share thoughts or news about your projects, please reach out to the Mozilla Open Innovation team at firstname.lastname@example.org.