Speech Projects — Acoustical Work
Communication is extremely important! The easiest and quickest ways to communicate and understand each other is the “speech”. :)
1. Generate
2. Recognize
3. Analysis
1. How to apply machine learning and deep learning methods to audio analysis
Audio Analysis — >
Machine Learning for Audio: Digital Signal Processing, Filter Banks, Mel-Frequency Cepstral Coefficients
DCT for Speech Signal Compression
Mel- frequency Cepstrum MFCC
Mel Frequency Cepstral Coefficients (MFCCs)
MFCC is used for the process of feature extraction where a more compact and less redundant of the representative voice can be obtained from the input voice
Filter bank — Compressed Spectrogram manipulate our ear
Speech recognition is still a growing field. … Fast Fourier Transform (FFT) is the traditional technique to analyze frequency spectrum of the signal in speech recognition.
Wavenet
Conditional WaveGAN Explained
NGC
Real Time Cloning
Dog voice Identification
Automatic Cry Recognition
Baby voice Detection
Voice Synthesis
Mean Opinion Score (MOS) for each voice. Test subjects ranked each voice on a scale of 1–5 according to how much it sounded like natural speech.
Conditional Voice Synthesis
Pixel Recurrent Neural Networks
Keywords from the Meeting
Low pass feature
Fourier Transform and then transform back
THAI SER
IEMOCAP
Speech Emotion Recognition IEMOCAP
— -
CSTR voice cloning toolkit (VCTK)
44 hours from 109 speakers