What We’re Building Next
In 2015, when the software architecture of YorubaName.com began to take place, we ran into an unexpected problem. If we wanted the dictionary to have audio pronunciation of all the names in the planned dictionary, we would have to employ one human being indefinitely to pronounce hundreds and hundreds of names in a dictionary that hopes to continue growing ad infinitum. We would also need an unlimited data space for all these new audios recorded every day. Both presented real logistical and financial nightmare. We didn’t have unlimited resources to accomplish both, or either.
So, we started thinking: Isn’t there a way to reduce Yorùbá to its component phonemes and then have a software match them when appropriately queried? This is, after all, the basic principle behind what is called speech synthesis. By training the computer to understand the basic phonology of a language, and getting a human being to pronounce limited number of segments in that language, the computer can then, technically, create an unlimited number of intelligible sentences in the language. This is how Siri got to be made, and Google Voice, and Amazon Echo, and many more. If we can achieve this for Yorùbá as well, we could solve the problem at once without spending much.
We eventually hacked a solution. If all the consonants in Yorùbá are pronounced with all the vowels in the language in all their tonal (register and contour), nasal, and individual variations, one could create unlimited number of words. The part needing my expertise was the linguistics part, calculating just how many iterations this computation amounts to, and getting a voiceover talent to pronounce them. Dadépọ̀, my partner, would train the computer to create the required concatenation necessary to achieve a competent voice.
It was not a perfect result, one, because we never managed to record all the vowel iterations necessary to create a complete work, so many syllables were omitted in the pronunciation. Some of our audio segments were also mistakenly recorded in mono rather than stereo, so that added a drag that made the output sometimes seem like a combination of a male and female voice. Along with that, our voiceover talent was a female relative who returned to her home before the recordings were complete, to write her SSCE exams. And so, because of time demands on the schedule of our team members working on it, it was impossible to start all over and complete it in record time.
That was in 2015. Yesterday, I set up a fundraising drive to return to the project. Along with a new team consisting of two linguists and a software developer, we’re hoping to finally get something concrete out. We will like to create an application that can turn written (properly tone-marked) Yorùbá text into a comprehensible coherent voice.
What would it sound like to have the computer speak to us in a language we understand? This is not a strange question anymore because we have seen it in action with Apple’s Siri, Amazon Echo, Google Voice, ATM machines, and many other automated systems like GPS controls. With those ones we use every day, could customize the voices to suit our preferences: young, female American, or old, male British. One capability I haven’t seen since I’ve been interacting with these devices, however, is the ability to speak my own language: Yorùbá. Not even local ATMs.
I’ve interacted with enough electronic devices to begin to feel like asking for such a capability is a waste of time, and I know that I’m not alone. Even while reading this, there will be many already asking “Why do you think you even need artificial intelligence in a Nigerian language? You should focus on ending hunger instead!” Or, those to whom the idea itself is risible. “Siri in Yorùbá? Ha ha. There is nothing I won’t hear in this world. Who will use it? Should oyinbos have to make everything for you?” There is, one realizes, a certain anglonormativity in the world (and this spills naturally into the tech fields as well) which makes any deviation or challenge appear, even to the questioner, like a strange ask. But it shouldn’t have to be that way.
So as much as this is a fun endeavour for us, an attempt to solve an old problem, it is also a push to achieve an important leap towards proper integration of African languages into the information technology age. Imagine being able to voice-control an electronic device in a Nigerian language. Now that will be a true leap in innovation. I looked at the languages in Siri on my iPad and realized that there is not one African language there. That is a shame. Not even Nigerian English!
Most people who bother with language endangerment speak more about the problem of people not speaking the language to their children, which is a valid fear. But they forget about an equally important limitation brought about by having those languages left behind in this brave new world controlled by machines. Finnish has 4 million speakers. Norwegian has 5 million, and Danish has 6 million. Yorùbá has over 30 million speakers (more than those three previous languages combined and multiplied by two) yet Siri doesn’t have a Yorùbá functionality. One of these languages will be limited in its interaction with technology in this new century!
So let me not hear any excuse that this is based on the language audience. We have the numbers. The interest just isn’t there! Speech synthesis is one step, but it is the basis for most Artificial Intelligence systems. The absence of African languages in this new field makes me wonder whether our so-called “tech space” eco-system hasn’t fallen into the same pit from where most of the continent’s problems stem: a desire to solve problems facing other places and other people while ignoring our own.
Now, since you’ve read this far, we need your help. If you care about creating solutions to real local problems capable of unleashing true innovation to a wide number of people — and also something with real profit potential — please help donate to our cause.