Fano Labs ranked Global Top 5 on DIHARD III Competition! — Diarization
Diarization is a process to partition audio according to different speakers. This process is required when we want to use Automatic Speech Recognition (ASR) to transcribe conversation in business phone calls (e.g. customer service hotlines) and meetings into text, where all speakers’ voices are recorded on the same track in the recording system. In order to precisely determine who spoke what and when, a good technology for speech diarization becomes crucial.
Speech analytics also leverages diarization to a new level because enterprises want to conduct Big Data analysis to learn more about the behavior of the customers during the conversations, and identify business insight or ways to improve their business and services. In addition, regulators have also imposed lots of compliance policies for enterprises to follow, especially the financial institutes, and they need to ensure their staff follows the policies to avoid penalties. Therefore, knowing what customers and staff have spoken precisely becomes even more important, which also drives the need for better diarization technology.
The challenge of building a good technology for diarization is not only separating different speakers from the speech but in reality, we also need to face challenges such as severe background noise, side speech, overlap speech, short sentences, etc.
Fano Labs’ research engineer, Mr. Leung Tsun Yat (“TY”), with the assistance of the Lead Speech Scientist, Dr. Lahiru Thilina Samarakoon, recently participated for the first time in the Global Third DIHARD Speech Diarization Challenge (DIHARD III), which has attracted experts in this field around the globe. During the challenge, the evaluated task is precisely speech diarization; that is, the task of determining “who spoke when” in a multi-speaker environment based only on audio recordings. By leveraging the latest AI technologies for end-to-end speech diarization, TY won Top 5 globally, under the “Diarization from Scratch” track, which is an outstanding achievement to showcase to the world that Fano Labs has the expertise and capability to deliver world-class performance.
The challenge is intended to improve the robustness of diarization systems to variation in recording equipment, noise conditions, and conversational domain. Speaker diarization is evaluated under two segmentation conditions (diarization from a reference speech segmentation vs. diarization from scratch) and 11 diverse domains. The domains span a range of recording conditions and interaction types, including reading audiobooks, meeting speech, clinical interviews, web videos, and, for the first time, conversational telephone speech. With the excellent result in diarization, Fans Labs will apply the technology to different solutions so as to fulfill the customers' needs in different scenarios.
同時間,話者分離技術亦可帶動語音分析(Speech Analytics)到一個新的層次。現今越來越多企業希望利用大數據分析,透過與客戶的對話了解客戶的行為和想法,從而得出業務洞見(Business Insight)或可改善業務/服務的地方。此外,監管機構亦為企業(特別是金融機構)制定許多合規政策以供業界遵循:企業需要確保其員工遵守這些合規政策以免受到監管機構處罰。因此,準確了解客戶和員工的對話變得更加重要,而市場對話者分離技術需求亦不段增加。
有光科技(Fano Labs)的研究工程師梁晉溢(“TY”),在首席語音科學家Lahiru Thilina Samarakoon博士的協助下,代表公司首次參加了全球第三屆DIHARD語音分析挑戰賽(DIHARD III)。挑戰賽的評分內容是進行準確的話者分離,就是在一個多講話者的錄音中分辨出”誰人在什麼時候說什麼”。TY利用了最新的人工智能技術(Artificial Intelligence),把比賽提供的聲音軌道從零開始進行話者分離的技術分析,並獲得全球Top 5的佳績!這是一個非常傑出的成績,展現出有光科技具備着國際領先的專業知識和能力,為客戶提供專業顧問和服務。