Diving Deep into Speech Accent Classification: A Case Study
Binary classification for Speech Accent Archive with facebook/wav2vec2-base-960h model
In this article, I describe the process of making a simple speech accent classifier. It is based on this Kaggle notebook with minor changes, such as using an updated facebook/wav2vec2-base-960h model pre-trained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. Full details of the analysis can be found in this public Kaggle notebook.
After training the full sample during 10 epochs (the process takes about 1 hour of NVIDIA TESLA P100 GPU available for Kaggle users), the accuracy has increased from 26% to about 92.7%:
Here is an example of a classification pipeline based on two audio samples (foreign and native):