Recognizing the Speech of Taiwan

by ColdSheep

Taiwan AI Labs
Taiwan AI Labs
4 min readJun 11, 2019

--

2017–12–06

Image credits: Illustration by Ana Beverin

We are exploring the new ways people interacts with technologies in the age of AI and speech is one of the most common and natural means of communication. In this post we are introducing our core recipes for automatic speech recognition system in Taiwan.

Cornerstone of Natural Human-Computer Interaction

Mobiles, IoT, wearable devices and robots. Our daily life are more and more likely to be surrounded by smart devices in the future. With the target to interact with them naturally, just as with human-beings, we need to develop related AI techniques such as machine learning, computer vision, natural language processing and speech processing.

Speech Recognition, so called ASR, is one of the cornerstone that link all these interactions together. With deep-learning-based model and graphical decoder, ASR nowadays is getting more reliable on both accuracy and speed.

Unique Language Habits in Taiwan

Different usage of words, new phrases and sentence structures are generated each day in our modern society and between cultures. This is especially true in Taiwan where the language habits of Taiwanese people is different from other Mandarin speakers.

Due to these reasons, the current ASR solutions in the Mandarin-speaking space have limitation when it comes to supporting general usages in Taiwanese people’s daily life. For example, the biggest Taiwan forum and Internet community, PTT, invents hundreds of words and phrases every month. The newly-created words might be used repeatedly or spread frequently by millions of users in online chatting and posting.

Therefore, the challenges of building a localized ASR system are not only about training a local neural network model, but also about how the system updates and adapts rapidly to the dynamically evolved language.

With a Taiwan-specific language model, our ASR can be much more friendly for speech related applications in Taiwan.

Multi-Language Speech Recognition

Although Mandarin is the official language in Taiwan, a Mandarin-only ASR system cannot satisfy our goals. Taiwan is an place with many different cultures. In addition to Mandarin, other languages such as English, Taiwanese, Hakka and Indigenous languages are also used pretty often in Taiwan. To deal with this problem Ailabs.tw gathered linguistics, phonetics and machine learning experts to set up a standard process when ASR facing cross language requirements.

These processes includes enriching language model with multiple languages and handling mixed-up words and sentences. Our early ASR experiments on Taiwanese works and we are now enhancing our system to production-level.

ASR Applications in Ailabs.tw

ASR system is already a powering the front-desk system in Ailabs.tw now. When an employee arrives at the office, they interacts with the ASR system for door access and need ID cards or badges no more.

An employee asking for door access to the ASR system

Another application is to generate automatic transcripts or captions. Videos of news, conferences, interviews can be convert to text files in real-time using ASR.

News video can now generate live captions with ASR

Our ASR API is ready to open, contact us if you want further cooperation.

Looking Forward

Speed, accuracy, multi-language and rapid updates are core aspects of a easy to use ASR system. We are continuously improving these cores and trying different deep learning algorithms to reach to a point where AI is doing a better job than human in this field. If you are interested in working on this problem, please contact us, we are actively hiring!

--

--