Uliza is a graduate of the Solution Space Venture Incubation Programme summer school 2018. At one level, the start-up’s mission sounds relatively simple: to transcribe and translate audio recordings of business communication into African and Asian languages. It does this using the language expertise of 300 human freelance translators based all over the world, who currently translate into Kiswahili, Marathi, Hindi, Twi, English, isiXhosa and isiZulu. Uliza’s clients are primarily consulting firms, research entities and call centre-based businesses.
But the translation service belies a much bigger aim, which will, in time, enable people all over Africa and Asia to interact with the internet in their own home language.
Uliza CEO Grant Bridgman explains: “We want to accelerate improvements in human-computer interaction for African and Asian language speakers, so that they can access information and internet-based services from businesses and governments. We think that the internet should be searchable in people’s local languages.”
Cellphones don’t always mean access
In 2016, Bridgman was working in Sierra Leone and while there, he was struck by a contradiction: “People have to walk for half a day to get to a flush toilet or to draw water, but at the same time, they’re taking pictures on their cellphone. The cellphone infrastructure is incredible. That’s only going to accelerate.”
However, despite the widespread connectivity, most people were excluded from many services that English speakers take for granted, explains Bridgman: “While English is the dominant language on the internet, people’s interaction with business services in the world was limited by the human factor: they couldn’t read English. The bottleneck isn’t the technology, it’s the interface between humans and technology. We wanted to develop a company that would open up that bottleneck.”
Existing technology has another built-in limitation. says Bridgman: “The keyboard is a relatively new invention. It’s clunky on smaller devices, and to use it, you still need to learn to communicate using Latin script. If people are unused to typing in Latin script, they tend to avoid typing.”
“We are hard-wired to communicate through the spoken word,” Bridgman continues. “Now that most people have internet-enabled devices, they will want to communicate in a more natural, human way through speech. They’d prefer to use WhatsApp voice notes because it’s easier to do so in their own home language.”
Voice recognition wasn’t the answer
Bridgman and his co-founders embarked on a project to build a voice recognition platform for search engines. Soon, they learned that they needed to start a few steps further back. He explains: “We realised we needed to address much narrower verticals like vocabulary and dialect. We needed to build an accurate dataset. And to get the accuracy, we need humans.”
That’s where the translation services come in. Human translators transcribe and translate a client’s audio recordings, using prescribed formats and types of transcription, in short segments. The transcriptions contribute to a database to build speech to text capabilities, which will, ultimately, form a machine-learning dataset. “If the human gets it right, the machine will get it right,” says Bridgman.
Building a language corpus
“We’re building a language corpus,” says Bridgman, describing a dataset that matches audio to transcripts that results in millions of matches of sounds that match words in text. As the dataset grows, with hundreds of thousands of audio hours of these matches, a machine will learn and automated speech-to-text in local languages becomes possible.
Uliza’s end goal is to create a translation and voice recognition layer for larger tech companies to provide their services. Bridgman envisages a future scenario in which someone can ask a question in Google Voice or Siri, for example, and receive the answer they need — all in their own language.
A grand, but complex vision
It’s a grand vision, however the application is complex, primarily because of prevailing mindsets. Bridgman explains that, with software such as Google Translate, Dragon and Nuance, there is a general understanding of what speech to text is, but they’re generally not aware of the constraints of building an accurate dataset. Other languages such as English, Spanish or German have huge datasets available, but those resources aren’t available for many other languages, such as Kiswahili or Afrikaans, as examples.
Price perception is one barrier, explains Bridgman: “There’s a perception that voice recognition should be free — breaking the pricing mindset has been very difficult. That free service doesn’t exist yet. I hope it will one day.”
Another significant issue is the task of building the right team, says Bridgman: “We’re a remote team of three building a team of 300 freelancers located around the world.”
Demand will drive development
Ultimately, however, Bridgman considers a more profound challenge to be the most important. It’s a much bigger topic than whether corporates will buy into Uliza’s product. The question is how to accelerate the process of building a local language machine-learning dataset. On this, Bridgman is clear: “The majority of the technologies developed in the western world — particularly things like the internet — was initiated and delivered with government and military spending. If African countries want their people to be able to access these public services, they should have access to technology that enables that access. I’d like to see more work and more funding committed at government level.”
That said, Bridgman, continues: “I’d like to see more African language speakers defend their languages. If people want to access services in Sotho or Zulu, or any other African or Asian language, they need to motivate for it; and demand it. Because, unless Google thinks it can make enough ad revenue out of it, it won’t happen.
“It’s unfair to demand that people learn another language simply because they want to access critical services, such as health care or legal advice. We want a more inclusive information and business environment,” says Bridgman.