Milana ShkhanukovaAlignment in TTS (text to speech)If in texts we always work in one modality, with speech, we have two sequences (audio aka spectrogram and text). There is no direct…5d ago
MuthukumarDetecting Voiced, Unvoiced and Silent parts of a speech signalA speech signal in essence is composed of 3 parts — Voiced, Unvoiced and Silent, it is a non stationary wave that keeps varying with time…Mar 19
SciforceinSciforceHot Topics at Interspeech 2024: The Latest in Technology of Spoken Language ProcessingWhat’s Hot at Interspeech 2024Oct 3Oct 3
Milana ShkhanukovaIs Wav2Vec an autoencoder? #paper_readingPreamble: Essentially, self-supervised learning helps us understand the nature of a subject. For text, this includes semantics, syntax, and…Sep 21Sep 21
Shahrukh khanA Deep Dive into Clara for Multimodal (Speech & Language) ModelingIntroductionAug 31Aug 31
Milana ShkhanukovaAlignment in TTS (text to speech)If in texts we always work in one modality, with speech, we have two sequences (audio aka spectrogram and text). There is no direct…5d ago
MuthukumarDetecting Voiced, Unvoiced and Silent parts of a speech signalA speech signal in essence is composed of 3 parts — Voiced, Unvoiced and Silent, it is a non stationary wave that keeps varying with time…Mar 19
SciforceinSciforceHot Topics at Interspeech 2024: The Latest in Technology of Spoken Language ProcessingWhat’s Hot at Interspeech 2024Oct 3
Milana ShkhanukovaIs Wav2Vec an autoencoder? #paper_readingPreamble: Essentially, self-supervised learning helps us understand the nature of a subject. For text, this includes semantics, syntax, and…Sep 21
Jyoti Dabass, Ph.D.inGoPenAIUnderstanding Speech Processing: A Beginner’s Guide to LPC, PLP, and MFCCWelcome to the wonderful world of speech processing! In this beginner’s guide, we will explore the fundamental techniques of Linear…Aug 8
Yi KuanTransformer Architectures for Multimodal Signal Processing and Decision Making | ICASSP 2022…Note: Full video can be found here. It is the tutorial on ICASSP 2022 Tutorial on “Transformer Architectures for Multimodal Signal…Jun 17, 2023